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bound.  The  addressed  metrics  are  evaluated  via  their  expected  values.  As  an  application, 
we  show  how  the  distributed  optimization  algorithm  can  be  used  to  perform  collabora¬ 
tive  system  identification  and  provide  numerical  experiments  under  the  randomized  and 
broadcast  gossip  protocols. 

Second,  we  generalize  the  asymptotic  consensus  problem  to  convex  metric  spaces. 
Under  minimal  connectivity  assumptions,  we  show  that  if  at  each  iteration  an  agent  up¬ 
dates  its  state  by  choosing  a  point  from  a  particular  subset  of  the  generalized  convex  hull 
generated  by  the  agents  current  state  and  the  states  of  its  neighbors,  then  agreement  is 
achieved  asymptotically.  In  addition,  we  give  bounds  on  the  distance  between  the  consen¬ 
sus  point(s)  and  the  initial  values  of  the  agents.  As  an  application  example,  we  introduce 
a  probabilistic  algorithm  for  reaching  consensus  of  opinion  and  show  that  it  in  fact  fits 
our  general  framework. 

Third,  we  discuss  the  linear  asymptotic  consensus  problem  for  a  network  of  dy¬ 
namic  agents  whose  communication  network  is  modeled  by  a  randomly  switching  graph. 
The  switching  is  determined  by  a  finite  state,  Markov  process,  each  topology  correspond¬ 
ing  to  a  state  of  the  process.  We  address  both  the  cases  where  the  dynamics  of  the  agents 
are  expressed  in  continuous  and  discrete  time.  We  show  that,  if  the  consensus  matrices 
are  doubly  stochastic,  average  consensus  is  achieved  in  the  mean  square  and  almost  sure 
senses  if  and  only  if  the  graph  resulting  from  the  union  of  graphs  corresponding  to  the 
states  of  the  Markov  process  is  strongly  connected. 

Fourth,  we  address  the  consensus-based  distributed  linear  filtering  problem,  where 
a  discrete  time,  linear  stochastic  process  is  observed  by  a  network  of  sensors.  We  assume 
that  the  consensus  weights  are  known  and  we  first  provide  sufficient  conditions  under 


which  the  stochastic  process  is  detectable,  i.e.  for  a  specific  choice  of  consensus  weights 
there  exists  a  set  of  filtering  gains  such  that  the  dynamics  of  the  estimation  errors  (with¬ 
out  noise)  are  asymptotically  stable.  Next,  we  develop  a  distributed,  sub-optimal  filtering 
scheme  based  on  minimizing  an  upper  bound  on  a  quadratic  filtering  cost.  In  the  station¬ 
ary  case,  we  provide  sufficient  conditions  under  which  this  scheme  converges;  conditions 
expressed  in  terms  of  the  convergence  properties  of  a  set  of  coupled  Riccati  equations. 
We  continue  by  presenting  a  connection  between  the  consensus-based  distributed  linear 
filter  and  the  optimal  linear  filter  of  a  Markovian  jump  linear  system,  appropriately  de¬ 
fined.  More  specifically,  we  show  that  if  the  Markovian  jump  linear  system  is  (mean 
square)  detectable,  then  the  stochastic  process  is  detectable  under  the  consensus-based 
distributed  linear  filtering  scheme.  We  also  show  that  the  optimal  gains  of  a  linear  filter 
for  estimating  the  state  of  a  Markovian  jump  linear  system,  appropriately  defined,  can  be 
used  to  approximate  the  optimal  gains  of  the  consensus-based  linear  filter. 
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Chapter  1 


Introduction 

This  chapter  serves  as  an  introduction  to  the  rest  of  the  thesis,  by  providing  the 
motivation  for  the  current  work.  Moreover,  it  introduces  the  problems  that  are  addressed 
and  our  contributions. 

1.1  Motivation 

In  the  following  chapters  we  address  problems  related  to  multi-agent  optimization 
and  filtering.  We  design  and  analyze  distributed  algorithms  which  are  based  on  the  con¬ 
sensus/agreement  asymptotic  algorithm  for  performing  localized  (i.e.  using  only  informa¬ 
tion  from  neighbors)  computations.  A  consensus  problem  consists  of  a  group  of  dynamic 
agents  who  seek  to  agree  upon  certain  quantities  of  interest  by  exchanging  information 
among  them  according  to  a  set  of  rules.  This  problem  can  model  many  phenomena  involv¬ 
ing  information  exchange  between  agents  such  as  cooperative  control  of  vehicles,  forma¬ 
tion  control,  flocking,  synchronization,  parallel  computing,  etc.  Distributed  computation 
over  networks  has  a  long  history  in  control  theory  starting  with  the  work  of  Borkar  and 
Varaiya  [5],  Tsitsikils,  Bertsekas  and  Athans  [51,  52]  on  asynchronous  agreement  prob¬ 
lems  and  parallel  computing.  A  theoretical  framework  for  solving  consensus  problems 
was  introduced  by  Olfati-Saber  and  Murray  in  [42,  43],  while  Jadbabaie  et  al.  studied 
alignment  problems  [18]  for  reaching  an  agreement.  Relevant  extensions  of  the  consen- 
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sus  problem  were  done  by  Ren  and  Beard  [39],  by  Moreau  in  [29]  or,  more  recently,  by 
Nedic  and  Ozdaglar  in  [32,  33]  or  by  Olshevsky  and  Tsitsiklis  in  [36]. 

Typically  agents  are  connected  via  a  network  that  changes  with  time  due  to  link  fail¬ 
ures,  packet  drops,  node  failure,  etc.  Such  variations  in  topology  can  happen  randomly 
which  motivates  the  investigation  of  consensus  problems  under  a  stochastic  framework. 
Hatano  and  Mesbahi  consider  in  [17]  an  agreement  problem  over  random  information 
networks,  where  the  existence  of  an  information  channel  between  a  pair  of  elements  at 
each  time  instance  is  probabilistic  and  independent  of  other  channels.  In  [38],  Porfiri  and 
Stilwell  provide  sufficient  conditions  for  reaching  consensus  almost  surely  in  the  case 
of  a  discrete  linear  system,  where  the  communication  flow  is  given  by  a  directed  graph 
derived  from  a  random  graph  process,  independent  of  other  time  instances.  Under  a  sim¬ 
ilar  model  of  the  communication  topology,  Tahbaz-Salehi  and  Jadbabaie  give  necessary 
and  sufficient  conditions  for  almost  sure  convergence  to  consensus  in  [44],  while  in  [45], 
the  authors  extend  the  applicability  of  their  necessary  and  sufficient  conditions  to  strictly 
stationary  ergodic  random  graphs. 

The  consensus  algorithm  proves  to  be  a  useful  tool  for  solving  distributively  opti¬ 
mization  and  estimation  problems.  Multi-agent  distributed  optimization  problems  appear 
naturally  in  many  distributed  processing  problems  (such  as  network  resource  allocation, 
collaborative  control  and  estimation,  etc.),  where  the  optimization  cost  is  a  convex  func¬ 
tion  which  is  not  necessarily  separable.  A  distributed  subgradient  method  for  multi-agent 
optimization  of  a  sum  of  convex  functions  was  proposed  in  [33],  where  each  agent  has 
only  local  knowledge  of  the  optimization  cost,  i.e.  knows  only  one  term  of  the  sum. 
The  agents  exchange  information  according  to  a  communication  topology,  modeled  as  an 
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undirected,  time  varying  graph,  which  defines  the  communication  neighborhoods  of  the 
agents.  The  agents  maintain  estimates  of  the  optimal  decision  vector,  which  are  updated 
in  two  stages.  The  first  stage  consists  of  a  consensus  step  among  the  estimates  of  an 
agent  and  its  neighbors.  In  the  second  stage,  the  result  of  the  consensus  step  is  updated 
in  the  direction  of  a  subgradient  of  the  local  knowledge  of  the  optimization  cost.  Another 
multi-agent  subgradient  method  was  proposed  in  [20],  where  the  communication  topol¬ 
ogy  is  assumed  time  invariant  and  where  the  order  of  the  two  stages  mentioned  above  is 
inverted. 

A  fundamental  problem  in  sensor  networks  is  developing  distributed  algorithms  for 
the  state  estimation  of  a  process  of  interest.  Generically,  a  process  is  observed  by  a  group 
of  (mobile)  sensors  organized  in  a  network.  The  goal  of  each  sensor  is  to  compute  accu¬ 
rate  state  estimates.  The  distributed  filtering  (estimation)  problem  has  received  a  lot  of 
attention  during  the  past  thirty  years.  An  important  contribution  was  made  by  Borkar  and 
Varaiya  [5],  who  addressed  the  distributed  estimation  problem  of  a  random  variable  by  a 
group  of  sensors.  The  particularity  of  their  formulation  is  that  both  estimates  and  mea¬ 
surements  are  shared  among  neighboring  sensors.  The  authors  show  that  if  the  sensors 
form  a  communication  ring,  through  which  information  is  exchanged  infinitely  often,  then 
the  estimates  converge  asymptotically  to  the  same  value,  i.e.  they  asymptotically  agree. 
An  extension  of  the  results  in  reference  [5]  is  given  in  [50].  The  recent  technological  ad¬ 
vances  in  mobile  sensor  networks  have  re-ignited  the  interest  in  the  distributed  estimation 
problem.  Most  papers  focusing  on  distributed  estimation  propose  different  mechanisms 
for  combining  the  Kalman  filter  with  a  consensus  filter  in  order  to  ensure  that  the  es¬ 
timates  asymptotically  converge  to  the  same  value,  schemes  which  will  be  henceforth 
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called  consensus  based  distributed  filtering  (estimation)  algorithms.  In  [41]  and  [40],  sev¬ 
eral  algorithms  based  on  the  idea  mentioned  above  are  introduced.  In  [8],  the  authors 
study  the  interaction  between  the  consensus  matrix,  the  number  of  messages  exchanged 
per  sampling  time,  and  the  Kalman  gain  for  scalar  systems.  It  is  shown  that  optimizing 
the  consensus  matrix  for  fastest  convergence  and  using  the  centralized  optimal  gain  is 
not  necessarily  the  optimal  strategy  if  the  number  of  exchanged  messages  per  sampling 
time  is  small.  In  [48],  the  weights  are  adaptively  updated  to  minimize  the  variance  of  the 
estimation  error.  Both  the  estimation  and  the  parameter  optimization  are  performed  in  a 
distributed  manner.  The  authors  derive  an  upper  bound  on  the  error  variance  at  each  node 
which  decreases  with  the  number  of  neighboring  nodes. 

1.2  Contributions  of  the  thesis 

Our  contributions  are  as  follows.  In  Chapter  2  we  study  the  performance  met¬ 
rics  (rate  of  convergence  and  guaranteed  region  of  convergence)  of  the  consensus-based, 
multi-agent  subgradient  method  proposed  in  [33],  for  the  case  of  a  constant  stepsize.  The 
communication  among  agents  is  modeled  by  a  random  graph,  independent  of  other  time 
instances,  and  the  performance  metrics  are  viewed  in  the  expectation  sense.  Random 
graphs  are  suitable  models  for  networks  that  change  with  time  due  to  link  failures,  packet 
drops,  node  failure,  etc.  Our  focus  is  on  providing  upper  bounds  on  the  performance 
metrics,  which  explicitly  depend  on  the  probability  distribution  of  the  random  graph.  The 
explicit  dependence  on  the  probability  distribution  allows  us  to  determine  the  optimal 
probability  distributions  in  the  sense  that  they  would  ensure  the  best  guaranteed  upper 
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bounds  on  the  performance  metric.  As  an  example  of  possible  applications  of  our  results, 
we  address  a  scenario  where  the  goal  is  to  tune  the  communication  protocol  parameters 
of  a  wireless  network  so  that  the  performance  of  the  multi-agent  subgradient  method  is 
improved,  in  the  context  of  a  distributed  parametric  system  identification  application. 

In  Chapter  2  we  emphasize  the  effect  and  importance  of  the  agreement  step  in  solv¬ 
ing  an  optimization  problem  distributively.  It  is  often  the  case  that  we  need  to  solve 
optimization  problems  that  go  beyond  the  R"  setup.  In  [47],  the  authors  formulate  opti¬ 
mization  problems  for  the  trusted  routing  problem  routing  under  a  semiring  framework.  In 
[28,  27],  the  popular  particle  swarm  optimization  algorithm  is  extended  to  combinatorial 
spaces,  such  as  Euclidean,  Manhattan,  and  Hamming  spaces.  Related  to  the  distributed 
optimization  algorithm  introduced  in  Chapter  2,  a  first  step  to  extend  the  applicability  of 
the  algorithm  is  to  formulate  and  analyze  the  agreement  problem  in  more  general  spaces. 
Consequently,  in  Chapter  3  we  generalize  the  asymptotic  consensus  problem  to  the  more 
general  case  of  convex  metric  spaces  and  emphasize  the  fundamental  role  of  the  gener¬ 
alized  notion  of  convexity  and  in  particular  of  the  generalized  convex  hull  of  a  finite  set 
of  points.  Tsitsiklis  showed  in  [51]  that,  under  some  minimal  connectivity  assumptions 
on  the  communication  network,  if  an  agent  updates  its  value  by  choosing  a  point  from 
the  (interior)  of  the  convex  hull  of  its  current  value  and  the  current  values  of  its  neigh¬ 
bors,  then  asymptotic  convergence  to  consensus  is  achieved.  We  will  show  that  this  idea 
extends  naturally  to  the  case  of  convex  metric  spaces.  As  an  application  we  present  a 
probabilistic  consensus  of  opinion  algorithm  and  show  that  it  fits  our  general  framework 
for  a  particular  convex  metric  space. 

In  Chapter  2  we  assume  that  the  communication  topology,  which  dictates  how  the 
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consensus  step  is  performed,  is  modeled  by  a  random  graph,  independent  of  other  time 
instances.  In  Chapter  4,  we  generalize  the  communication  model  and  study  the  linear 
consensus  problem  where  the  communication  flow  between  agents  is  modeled  by  a  (pos¬ 
sibly  directed)  switching  random  graph.  The  switching  is  determined  by  a  homogeneous, 
finite- state  Markov  chain,  each  communication  pattern  corresponding  to  a  state  of  the 
Markov  process.  We  address  both  the  continuous  and  discrete  time  cases  and,  under  cer¬ 
tain  assumptions  on  the  matrices  involved  in  the  linear  scheme,  we  give  necessary  and 
sufficient  conditions  such  that  average  consensus  is  achieved  in  the  mean  square  sense 
and  in  the  almost  sure  sense.  The  Markovian  switching  model  goes  beyond  the  com¬ 
mon  i.i.d.  assumption  on  the  random  communication  topology  and  appears  in  the  cases 
where  Rayleigh  fading  channels  are  considered.  Our  aim  is  to  show  how  mathemati¬ 
cal  techniques  used  in  the  stability  analysis  of  Markovian  jump  linear  systems,  together 
with  results  inspired  by  matrix  and  graph  theory,  can  be  used  to  prove  (intuitively  clear) 
convergence  results  for  the  (linear)  stochastic  consensus  problem. 

In  Chapter  5  we  address  the  consensus-based  distributed  linear  filtering  problem. 
We  assume  that  each  agent  updates  its  (local)  estimate  in  two  steps.  In  the  first  step,  an 
update  is  produced  using  a  Luenberger  observer  type  of  filter.  In  the  second  step,  called 
the  consensus  step ,  every  sensor  computes  a  convex  combination  between  its  local  update 
and  the  updates  received  from  the  neighboring  sensors.  For  given  consensus  weights,  we 
will  first  give  sufficient  conditions  for  the  existence  of  filter  gains  such  that  the  dynamics 
of  the  estimation  errors  (without  noise)  are  asymptotically  stable.  Next,  we  present  a 
distributed,  sub-optimal  filtering  algorithm,  valid  for  time  varying  topologies  as  well, 
resulting  from  minimizing  an  upper  bound  on  a  quadratic  cost  expressed  in  terms  of  the 
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covariances  matrices  of  the  estimation  errors.  We  will  also  present  a  connection  between 
the  consensus-based  linear  filter  and  the  linear  filtering  of  a  Markovian  jump  linear  system 
appropriately  defined,  a  connection  which  was  inspired  by  our  previous  work  on  state 
estimation  for  switching  systems  (see  for  instance  [24],  [25]). 
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Chapter  2 


Distributed  Optimization  under  Random  Communication  Topologies 
2. 1  Introduction 

We  investigate  the  collaborative  optimization  problem  in  a  multi-agent  setting,  when 
the  agents  make  decisions  in  a  distributed  manner  using  local  information,  while  the 
communication  topology  used  to  exchange  messages  and  information  is  modeled  by  a 
graph-valued  random  process,  assumed  independent  and  identically  distributed  (i.i.d.). 
Specifically,  we  study  the  performance  of  the  consensus-based  multi-agent  distributed 
subgradient  method  proposed  in  [33],  for  the  case  of  a  constant  stepsize. 

Random  graphs  are  suitable  models  for  networks  that  change  with  time  due  to  link 
failures,  packet  drops,  node  failures,  etc.  An  analysis  of  the  multi-agent  subgradient 
method  under  random  communication  topologies  is  addressed  in  [22].  The  authors  as¬ 
sume  that  the  consensus  weights  are  lower  bounded  by  some  positive  scalar  and  give 
upper  bounds  on  the  performance  metrics  as  functions  of  this  scalar  and  other  parameters 
of  the  problem.  More  precisely,  the  authors  give  upper  bounds  on  the  distance  between 
the  cost  function  and  the  optimal  solution  (in  expectation),  where  the  cost  is  evaluated 
at  the  (weighted)  time  average  of  the  optimal  decision  vector’s  estimate.  Our  main  goal 
is  to  provide  upper  bounds  on  the  performance  metrics,  which  explicitly  depend  on  the 
probability  distribution  of  the  random  graph.  We  first  derive  an  upper  bound  on  the 
difference  between  the  cost  function,  evaluated  at  the  estimate,  and  the  optimal  value. 
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Next,  for  a  particular  class  of  convex  functions,  we  focus  on  the  distance  between  the 
estimate  of  the  optimal  decision  and  the  minimizer.  The  upper  bound  we  provide  has  a 
constant  component  and  a  time  varying  component.  For  the  latter,  we  provide  the  rate 
of  convergence  to  zero.  The  performance  metrics  are  evaluated  via  their  expected  val¬ 
ues.  The  explicit  dependence  on  the  graph’s  probability  distribution  may  be  useful  to 
design  probability  distributions  that  would  ensure  the  best  guaranteed  upper  bounds  on 
the  performance  metrics.  This  idea  has  relevance  especially  in  the  wireless  networks, 
where  the  communication  topology  has  a  random  nature  with  a  probability  distribution 
(partially)  determined  by  the  communication  protocol  parameters  (the  reader  can  consult 
[21,  35],  where  the  authors  introduce  probabilistic  models  for  successful  transmissions  as 
functions  of  the  transmission  powers).  As  an  example  of  possible  application,  we  show 
how  the  distributed  optimization  algorithm  can  be  used  to  perform  collaborative  system 
identification  and  we  present  numerical  experiments  results  under  the  randomized  [7]  and 
broadcast  [1]  gossip  protocols.  Similar  performance  metrics  as  our  are  studied  in  [2], 
where  the  authors  generalizes  the  randomized  incremental  subgradient  method  and  where 
the  stochastic  component  in  the  algorithm  is  described  by  a  Markov  chain,  which  can  be 
constructed  in  a  distributed  fashion  using  local  information  only.  Newer  results  on  the  dis¬ 
tributed  optimization  problem  can  be  found  in  [13],  where  the  authors  analyze  distributed 
algorithms  based  on  dual  averaging  of  subgradients,  and  provide  sharp  bounds  on  their 
convergence  rates  as  a  function  of  the  network  size  and  topology. 

Notations:  Let  X  be  a  subset  of  R"  and  let  y  be  a  point  in  R”.  By  slight  abuse 
of  notation,  let  ||y  -  X||  denote  the  distance  from  the  point  y  to  the  set  X,  i.e.  ||y  -X||  = 
minxex  | |y -  x\ | ,  where  ||  •  ||  is  the  standard  Euclidean  norm.  For  a  twice  differentiable  func- 
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tion  fix),  we  denote  by  V/(x)  and  V2/(x)  the  gradient  and  Hessian  of  /  at  x,  respectively. 
Given  a  symmetric  matrix  A,  by  (A  >  0)  A  >  0  we  understand  A  is  positive  (semi)  definite. 
The  symbol  0  represents  the  Kronecker  product. 

Let  /  :  R"  — >  R  be  a  convex  function.  We  denote  by  df(x)  the  subdifferential  of  / 
at  x,  i.e.  the  set  of  all  subgradients  of  /  at  x: 

df  (x)  =  {d  €  Rw|/(y)  >  /  (x)  +  d'(y  —  x),  VyeR"}.  (2.1) 

Let  e  >  0  be  a  nonnegative  real  number.  We  denote  by  def(x)  the  e-  subdifferential  of  /  at 
x,  i.e.  the  set  of  all  6- sub  gradients  of  /  at  x: 

defix)  =  { d  €  R”\f(y)  >  f(x )  +  d'(y  -x)-e,  Vy  6  R'!}.  (2.2) 

The  gradient  of  the  differentiable  function  /(x)  on  R"  satisfies  a  Lipschitz  condition  with 
constant  L  if 

HV/(x)- V/GOH  <  L||x-y||,  Vx,y  6  Rn. 

The  differentiable,  convex  function  /(x)  on  R"  is  strongly  convex  with  constant  l  if 

fiy)  >  fix)  +  V/(x)'(y  -  x)  +  l-\\y  -  x||2,  Vx,y  e  R". 

We  will  denote  by  LEM  and  SLEM  the  largest  and  second  largest  eigenvalue  in  modulus 
of  a  matrix,  respectively.  We  will  use  CBMASM  as  the  abbreviation  for  Consensus-Based 
Multi-Agent  Subgradient  Method  and  pmf  for  probability  mass  function. 

Chapter  structure :  Section  2.2  contains  the  problem  formulation.  In  Section  2.3  we 
introduce  a  set  of  preliminary  results,  which  mainly  consist  of  providing  upper  bounds  for 
a  number  a  quantities  of  interest.  Using  these  preliminary  results,  in  Section  2.4  we  give 
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upper  bounds  for  the  expected  value  of  two  performance  metrics:  the  distance  between 
the  cost  function  evaluated  at  the  estimate  and  the  optimal  solution  and  the  (squared) 
distance  between  the  estimate  and  the  minimizer.  Section  2.5  shows  how  the  distributed 
optimization  algorithm  can  be  used  for  collaborative  system  identification. 

2.2  Problem  formulation 

2.2.1  Communication  model 

Consider  a  network  of  N  agents,  indexed  by  i  -  The  communication  topol¬ 

ogy  is  time  varying  and  is  modeled  by  a  random  graph  G(k )  =  (V,S(k)),  where  V  is  the 
set  of  N  vertices  (nodes)  and  S(k)  =  (e,j(k))  is  the  set  of  edges,  and  where  we  used  k  to 
denote  the  time  index.  The  edges  in  the  set  &(k )  correspond  to  the  communication  links 
among  agents.  Given  a  positive  integer  M,  the  graph  G(k )  takes  values  in  a  finite  set 
Q  -  {Gi,G2,  . . .  ,Gm)  at  each  k,  where  the  graphs  G,  =  (V,£i)  are  assumed  undirected  and 
without  self  loops.  In  other  words,  we  will  consider  only  bidirectional  communication 
topologies.  The  underlying  random  process  of  G(k)  is  assumed  i.i.d.  with  probability 
distribution  Pr(G(k )  =  G,)  =  pi,  Tk  >  0,  where  TjfiiPi  =  1  and  pi  >  0. 

Assumption  2.2.1.  (Connectivity  assumption)  The  graph  G  =  ( V.  E )  resulting  from  the 
union  of  all  graphs  in  the  Q  is  connected,  where 

M  /  M  ' 

G  =  Ug,=  V,y<5/  . 

1=1  V  1=1  > 

Let  G  be  an  undirected  graph  with  N  nodes  and  no  self  loops  and  let  A  6  R'VxA/ 
be  a  row  stochastic  matrix,  with  positive  diagonal  entries.  We  say  that  the  matrix  A 
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corresponds  to  the  graph  G  or  the  graph  G  is  induced  by  A  if  any  non-zero  entry  (if)  of 
A,  with  i  ±  j  implies  a  link  from  j  to  i  in  G  and  vice-versa. 

2.2.2  Optimization  model 

The  task  of  the  N  agents  consists  of  minimizing  a  convex  function  /  :  R"  — >  R.  The 

function  /  is  expressed  as  a  sum  of  N  functions,  i.e. 

N 

/(*)  =  £/,(*).  (2.3) 

i=  1 

where  f :  Rw  -a  R  are  convex.  Formally  expressed,  the  agents  want  to  cooperatively  solve 
the  following  optimization  problem 

N 

min  y.fiix).  (2.4) 

1=  1 

The  fundamental  assumption  is  that  each  agent  i,  has  access  only  to  the  function  f. 

Let  f*  denote  the  optimal  value  of  /  and  let  X*  denote  the  set  of  optimizers  of  /, 
i.e.  X*  =  {x  £  IR"|/'(a')  =  /*}.  Let  xfk)  £  R”  designate  the  estimate  of  the  optimal  decision 
vector  of  (2.4),  maintained  by  agent  i,  at  time  k.  The  agents  exchange  estimates  among 
themselves  subject  to  the  communication  topology  described  by  the  random  graph  G(k). 

As  proposed  in  [33],  the  agents  update  their  estimates  using  a  modified  incremental 
subgradient  method.  Compared  to  the  standard  subgradient  method,  the  local  estimate 
x i(k)  is  replaced  by  a  convex  combination  of  xfk)  with  the  estimates  received  from  the 
neighbors: 

N 

Xi(k+  1)  =  cij j(k)x j(k)  - a(k)dj(k),  (2.5) 

7=1 

where  aij(k)  is  the  ( i,j)th  entry  of  a  stochastic  random  matrix  A(k)  which  corresponds 
to  the  communication  graph  G(k).  The  matrices  Aik)  form  an  i.i.d.  random  process 
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taking  values  in  a  finite  set  of  symmetric  stochastic  matrices  with  positive  diagonal  entries 
-  {Ai)¥v  where  A,-  is  a  stochastic  matrix  corresponding  to  the  graph  G;  6  Q.  for  i  = 
1  The  probability  distribution  of  A(k)  is  inherited  from  G(k),  i.e.  Pr(A(k)  =  A,)  = 

Pr(G(k )  =  Gi )  =  pi.  The  real  valued  scalar  a(k)  is  the  stepsize,  while  the  vector  dftk)  e  R" 
is  a  subgradient  of  /]•  at  xftk),  i.e.  d,(k)  e  dft(xi(k)).  Obviously,  when  ftix)  are  assumed 
differentiable,  d,(k)  becomes  the  gradient  of  ft  at  xftk),  i.e.  d,(k)  =  V ft ftxftk)). 

Note  that  the  first  part  of  equation  (2.5)  is  a  consensus  step,  a  problem  that  has 
received  a  lot  of  attention  in  recent  years,  both  in  a  deterministic  ([6,  14,  18,  29,  39,  51, 
52])  and  a  stochastic  ([17,  23,  44,  45])  framework. 

The  consensus  problem  under  different  gossip  algorithms  was  studied  in  [1,  7,  12]. 
We  note  that  there  is  direct  connection  between  our  communication  model  and  the  com¬ 
munication  models  used  in  the  randomized  gossip  protocol  [7]  and  broadcast  communi¬ 
cation  protocol  [1].  Indeed,  in  the  case  of  the  randomized  communication  protocol,  the 
set  Q  is  formed  by  the  graphs  G(/  with  only  one  link  (/',  ft),  where  Pr(G(k)  =  Gift  =  jjPij 
for  some  Pjj  >  0  with  Y!j=\Pij  =  1>  while  the  set  is  formed  by  stochastic  matrices  A,;-  of 
the  form  A,y  -  l-  i(e,  - cy )(<?,- - e ft)' ,  where  vectors  the  <?,  represent  the  standard  basis.  In 
the  case  of  the  broadcast  communication  protocol,  the  set  Q  is  formed  by  the  graphs  G,-, 
where  Gi  contains  links  between  the  node  i  and  the  nodes  in  its  neighborhood,  denoted 
by  Nj.  The  probability  distribution  of  G(k)  is  given  by  Pr(G(k)  =  Gft  =  jj  and  the  set  dh 
is  formed  by  matrices  of  the  form  A,-  =  7  -  6,  Y.pN^Pi  ~  e  j)Gi  -  e  j)',  for  some  0  <  <5,  <  . 

The  following  assumptions,  which  will  not  necessarily  be  used  simultaneously,  in¬ 
troduce  properties  of  the  function  f{x). 
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Assumption  2.2.2.  (Non-differentiable  functions) 


(a)  The  subgradients  of  the  functions  fix)  are  uniformly  bounded,  i.e.  there  exists  a 
positive  scalar  ip  such  that 

||d||  <cp,dde  df(x),  Tx  e  R",  i-  1, . . . ,  N, 

(b)  The  stepsize  is  constant,  i.e. 

a(k )  =  a,  Vk>  0, 

(c)  The  optimal  solution  set  X*  is  nonempty. 

Assumption  2.2.3.  (Differentiable  functions) 

(a)  The  functions  J}(x)  are  twice  differentiable  on  R'1, 

(b)  There  exists  positive  scalars  U,  Lt  such  that 

Ijl  <  X2f(x)  <  Ljl,  Tx  e  R'1  and  Vi, 

(c)  The  stepsize  is  constant,  i.e.  a{k)  =  a  for  all  k  and  satisfies  the  inequality 

f  A  +  1  1 

0  <  a  <  mm< - .  - 

[LI 

where  A  is  the  smallest  among  all  eigenvalues  of  matrices  A,-,  /  =  min,-  /,■  and  L  = 
max,  Lj. 

Assumption  2.2.3  -(b)  is  satisfied  if  the  gradient  of  f(x)  satisfies  a  Lipschitz  condi¬ 
tion  with  constant  L,  and  if  f(x)  is  strongly  convex  with  constant  Also,  under  Assump¬ 
tions  2.2.3,  X*  has  one  element  which  is  the  unique  minimizer  of  f{x),  denote  henceforth 
by  x*. 
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2.3  Preliminary  Results 


In  this  section  we  lay  the  foundation  for  our  main  results  in  Section  2.4.  The  pre¬ 
liminary  results  introduced  here  revolve  around  the  idea  of  providing  upper-bounds  on 
a  number  of  quantities  of  interest.  The  first  quantity  is  represented  by  the  distance  be¬ 
tween  the  estimate  of  the  optimal  decision  vector  and  the  average  of  all  estimates.  The 
second  quantity  is  described  by  the  distance  between  the  average  of  all  estimates  and  the 
minimizer. 

We  introduce  the  average  vector  of  estimates  of  the  optimal  decision  vector,  de¬ 
noted  by  x{k)  and  defined  by 

1  N 

x(k)  = —^xi(k).  (2.6) 

1=1 

The  dynamic  equation  for  the  average  vector  can  be  derived  from  (2.5)  and  takes  the  form 

x(k  +  1)  =  x(k)  -  ^ h(k ),  (2.7) 

where  h(k)  =  di(k). 

We  introduce  also  the  deviation  of  the  local  estimates  Xi(k)  from  the  average  esti¬ 
mate  x(k),  which  is  denoted  by  Zj(k)  and  defined  by 

Zi{k)  =  xtik)  -  x{k),  i  =  1  ...N.  (2.8) 

and  let  [5  be  a  positive  scalar  such  that 

||z,-(0)||  </?,  i= 

Let  us  define  the  aggregate  vectors  of  estimates,  average  estimates,  deviations  and  (sub)gradients, 
respectively: 

x(k)'  =  [xi(ky,x2(ky,...,xN(ky]  6  R^", 
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x(k)'  =  [x(k)',x(ky, . .  .,x (k)']  £ 

Z ;(*)'  =  [zi(ky,z2(ky,...,zN(ky]  e  R'Vn 

and 

my  =  [di(ky,d2(ky,...,dN(ky]  e  rNk. 

From  (2.6)  we  note  that  the  aggregate  vector  of  average  estimates  can  be  expressed  as 

x(k)  =  Jx(fc), 

where  J  =  ^11'®/,  with  I  the  identity  matrix  in  R'!X"  and  1  the  vector  of  all  ones  in  R'v. 
Consequently,  the  aggregate  vector  of  deviations  can  be  written  as 

z(fc)  =  (I-J)x(fc),  (2.9) 

where  I  is  the  identity  matrix  in  RnNxnN.  The  next  Proposition  characterizes  the  dynamics 
of  the  vector  z (k). 

Proposition  2.3.1.  The  dynamic  evolution  of  the  aggregate  vector  of  deviations  is  given 
by 

z(k  +  1)  =  W(k)z(k)  -  a(k)(I  -  J)d(k),  z(0)  =zo,  (2.10) 

where  W(k)  =A(k )  -J  andA{k)  =  Aik)®  I,  with  solution 

k- 1 

z(k)  =  P (k,  0)z(0)  -  ^  a(s)A>(k,  s  +  1  )d(s),  (2.11) 

s=0 

where  P  (k,  s)  is  the  transition  matrix  of  (2.10)  defined  by  P(/c,  ,v)  =  W(k-  1  )W(k-2)  ■  ■  ■  Wis), 
with  P (k,  k )  =  I. 
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Proof.  From  (2.5)  the  dynamics  of  the  aggregate  vector  of  estimates  is  given  by 


x(k  +  1)  =  A (k)x(k)  -  a(k)d(k). 


(2.12) 


From  (2.9)  together  with  (2.12),  we  can  further  write 


z(k+l)  =  (I- J)x(£+  1)  =  (A(fc)-J)x(fc)-cr(fc)(I-J)d(fc). 


By  noting  that 


(Mk)  -  J)z (k)  =  (A (k)  -  J)(I  -  J )x(k)  =  (A (k)  -  J )x(k). 


we  obtain  (2.10).  The  solution  (2.11)  follows  from  (2.10)  together  with  the  observation 
that  ®(fc,  s)(I  -  J)  =  ®(fc,  5).  □ 


Remark  2.3.1.  The  transition  matrix  (\Hk.  s)  of  the  stochastic  linear  equation  (2.10)  can 
also  be  represented  as 


<S>(k,s)  = 


Y^A(k-i) 


V  i=l 


J, 


(2.13) 


where  J  =  This  follows  from  the  fact  that  for  any  i  6  {1,2,. ..,5-  1}  we  have 


(A(k  -  i )  -J)(A(k  -  i  -  1)  -J)  =  A(k  -  i)A(k  -i-\)-J. 


Remark  2.3.2  (On  the  first  and  second  moments  of  the 
transition  matrix  ®(fc,  5)).  Let  m  be  a  positive  integer  and  consider  the  transition  matrix 
0>(k+m,k)  =  W(k  +  m-  1 ) . . .  W(k),  generated  by  a  sequence  of  length  m  of  random  graphs, 
i.  e.  G(k) . . .  G(k  +  m  -  1 ),  for  some  k  >  0.  The  random  matrix  d>(k  +  m,  k)  takes  values  of 
the  form  Wi2  ■  ■  ■  W-lm,  with  ij  6  {1,2  and  j  =  1 ,...,/«.  The  norm  of  a  particular 

realization  of  dHk  +  m,k )  is  given  by  the  LEM  of  the  matrix  product  Wq  Wj2  ■  ■  ■  Wjm  or  the 
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SLEM  of  At]Al2  ■  ■  ■ Aim ,  denoted  henceforth  by  Aiv..im.  Let  qq.jm  =  n”Li  Pij  he  the  proba¬ 
bility  of  the  sequence  of  graphs  G,,  . . .  G,m  that  appear  during  the  time  interval  [k,  k  +  m\. 
Let  Im  be  the  set  of  sequences  of  indices  of  length  mfor  which  the  union  of  graphs  with  the 
respective  indices  produces  a  connected  graph,  i.e.  Im  =  {i\h  •  •  ■  hn\  UJ=i  Gjj  =  connected }. 
Using  the  previous  notations,  the  first  and  second  moments  of  the  norm  of  <\Hk  +  m,  k )  can 
be  expressed  as 

E[\mk  +  rn,k)\\\=qm,  (2.14) 

E[\mk  +  rn,k)\\2]=pm,  (2.15) 

where  //,„  =  Z  jerm  +  1  -  and  pm  =  Y,j(zimqjA2j  +  1  -  Z7e/„,  qj-  The  integer  j 

was  used  as  an  index  for  the  elements  of  set  i.e.  for  an  element  of  the  form  i\...  im. 

The  above  formulas  follow  from  results  introduced  in  [18],  Lemma  1,  or  in  [39], 
Lemma  3.9,  which  state  that  for  any  sequence  of  indices  i\  . . .  im  £  Im,  the  matrix  product 
A/i  •  •  •  A  im  is  ergodic,  and  therefore  Aj  <  1,  for  any  j  £  Conversely,  if  j  £  Im  then  Aj  =  1. 
We  also  note  that  Tjjel,,,  c[j  LS  the  probability  of  having  a  connected  graph  over  a  time 
interval  of  length  m.  Due  to  Assumption  2.2.1,  for  sufficiently  large  values  ofm,  the  set  Im 
is  nonempty.  In  fact  for  m  >  M,  Im  is  always  non-empty.  Therefore,  for  any  m  such  that 
1  m  is  not  empty,  we  have  that  0  <  pm  <  q,„  <  1.  In  general  for  large  values  ofm,  it  may  be 
difficult  to  compute  all  eigenvalues  Aj,  j  £  /,„.  We  can  omit  the  necessity  of  computing  the 
eigenvalues  Aj,  and  this  way  decrease  the  computational  burden,  by  using  the  following 
upper  bounds  on  qm  and  pm 


T/m  ^  \nPm  1  Pnv 

(2.16) 

Pm  —  ^mP m  ^  Pm ’ 

(2.17) 
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where  =  ma xyE/m  A  j  and  pm  =  X/6/(„  d j  ls  the  probability  to  have  a  connected  graph 
over  a  time  interval  of  length  m.  For  notational  simplicity,  in  what  follows  we  will  omit 
the  index  m  when  referring  to  the  sccdars  i]m  and  pm. 

Throughout  this  chapter  we  will  use  the  symbols  m,  rj  and  p  in  the  sense  defined 
within  the  Remark  2.3.2.  Moreover,  the  value  of  m  is  chosen  such  that  Im  is  nonempty. 
The  existence  of  such  a  value  is  guaranteed  by  Assumption  2.2.1. 

The  next  proposition  gives  upper  bounds  on  the  expected  values  of  the  norm  and 
the  squared  norm  of  the  transition  matrix  0(k,  s). 

Proposition  2.3.2.  Let  Assumption  2.2.1  hold,  and  let  r  <  s  <  k  be  three  nonnegative 
integer  values  and  m  a  positive  integer,  such  that  the  set  Im  is  non-empty.  Then,  the 
following  inequalities  involving  the  transition  matrix  0(A,  ,v)  of  (2.10),  hold 


£[||<f>(U)ll]  < 

(2.18) 

E[\mk,s)\\2]<p^l 

(2.19) 

£[||®(k,r)®(k,s)'||]  <p[^\q\-^rl. 

(2.20) 

where  q  and  p  are  defined  in  Remark  2.3.2. 


Proof.  We  fix  an  m  such  that  the  probability  of  having  a  connected  graph  over  a  time 
interval  of  length  m  is  positive,  i.e.  /,„  is  non-empty.  Note  that,  by  Assumption  2.2.1, 
such  a  value  always  exists  (pick  m  >  M).  Let  t  be  the  number  of  intervals  of  length  m 
between  5  and  k,  i.e. 


t  = 


k  —  s 
m 
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and  let  sq,  si  , . . . ,  st  be  a  sequence  of  nonnegative  integers  such  that  s  =  so  <  si  < ...  <  st<k 
where  .v,+ 1  -  5;  =  m  and  i  =  0, . . .  ,m  -  1.  By  the  semigroup  property  of  transition  matrices, 
it  follows  that 

<E Kk,s)  =  O (k,  st)Q>(st,st-i)---<!>(sl,s), 

or 

||0(fc,v)||  <  ||0(sr,s/_1)||" -110(51,5)11, 

where  we  use  the  fact  that  ||0(A:,5/)||  <  1.  Using  the  i.i.d.  assumption  on  the  random 
process  A (k),  we  can  further  write 

E[\mk, s)ll]  <  E[\mst, 5f_i)||]  •  •  •  £[||0(5i, 5)||], 

which  together  with  (2.14)  leads  to  inequality  (2.18). 

Similarly,  inequality  (2.19)  follows  from  (2.15)  and  from  the  i.i.d.  assumption  on 
the  random  graph  process. 

We  now  turn  to  inequality  (2.20).  By  the  semigroup  property  we  get 

£’[||0(Ur)0(U5),||]  <  £[||0(fc, s)||2||0(s,r)||]  <  5)||2]£’[||0(5,r)||], 

where  the  second  inequality  followed  by  the  independence  of  A(k).  Inequality  (2.20) 
follows  from  (2.18)  and  (2.19).  □ 

In  the  next  lemma  we  show  that,  under  Assumption  2.2.3,  for  small  enough  a  the 
gradients  Vfi(xj(k ))  remain  bounded  with  probability  one  for  all  k. 

Lemma  2.3.1.  Let  Assumption  2.2.3  hold  and  let  ‘F  :  R'v"  — »  R  he  a  function  given  by 
Fix)  =  i  J'i(xi)  where  x'  =  (x^ , . .  .,x'N).  There  exists  a  positive  scalar  ip  such  that 

\\V fi(xi(k))\\  <ip,d  i,kw.p.  1, 
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I|V/K*(A:))||  ^  <A  V  Uk  w.p.  1, 

where  <p  =  L||jc(0)-jc||  +  l( +  l)  ||jc||,  q  =  max{\A- aL\,\  \  - al\),  x  is  the  unique minimizer 
of 'Fix),  and  xfk)  and  x(k)  satisfy  (2.5)  and  (2.7),  respectively. 

Proof.  We  first  note  that  since  the  matrices  A,  have  positive  diagonal  entries,  they  are 
aperiodic  and  therefore  A  e  (-1  1].  From  Assumption  2.2.3  it  follows  immediately  that 
F(x)  is  a  convex,  twice  differentiable  function  satisfying 

ll  <  V2F(x)  <  LI,  (2.21) 

where  /  =  min,  /,,  L  =  max,  L,  and  I  is  the  identity  matrix  in  E"A?X"A/.  In  addition,  Fix) 
has  a  unique  minimizer  denoted  by  x.  The  dynamics  described  by  (2.5)  can  be  compactly 
written  as 

x(k+  1)  =  A (k)x(k)-aVF(x(k)),  x(0)  =  xo,  (2.22) 

withx(^)'  =  (xi  (k)',...,  XN(k)'). 

We  observe  that  equation  (2.22)  is  a  modified  version  of  the  gradient  method  with 
constant  step,  where  instead  of  the  identity  matrix,  we  have  that  A (k)  multiplies  xik).  In 
what  follows  we  show  that  the  stochastic  dynamics  (2.22)  is  stable  with  probability  one. 
Using  a  similar  idea  as  in  Theorem  3,  page  25  of  [37],  we  have  that 

VF(x(k))  =  VF(x)  +  r  V2F(x  +  T(x(k) -x))(x(k) -x)dr  =  <H(k)(x(k) -x), 

Jo 

where  ll  <  'M(k)  <  Ll  by  virtue  of  (2.21).  Hence,  with  probability  one 

\\x(k  +  1)  -  x||  =  \\A(k)x(k)  -  x  -  aVF(x(k))  +  A (k)x  -  A(fc)x||  < 

<  \\A(k)-aF((k)\\  ||x(fc)-x||  +  ||A(fc)-I||  ||x||. 
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But  since 


(A  -  aL) I  <  A (k)  -  W)  <  (1  -  al) I, 


it  follows  that 

\\x(k+  1)  -x||  <  q\\x(k)  -x||  +  |T-  l|||x||, 

where  q  -  max{|/l-Q'L|,|l  -al\).  Since  by  Assumption  2.2.3-(c)  a  <  minj^-,  jj  we  get 
that  q  <  1  and  therefore  the  dynamics  (2.22)  is  stable  with  probability  one  and 

2  2 

||x(k)  -x||  <  </||x(0)-x||  +  - - ||x||  <  ||x(0)-x||  +  - - ||x||,  Vk. 

l-q  l-q 

From  Assumption  2.2.3  we  have  that 

\Wfi(xi(k))\\  <  \\VT(xm\  <L||x(fc)-x||  <L||x(0)-x||  +  -^||x||.  (2.23) 

l-q 

We  also  have  that 


||x(k)-x||  =  ||Jx(/r)  —  Jx  +  Jx-x||  <  ||x(k)-x||  +  ||x||, 

from  where  it  follows  that 

\\Vfi(xm\  <  HV!T(x(fc))||  <L||x(fc)-x||  <L||x(0)-x||+l|y^-  +  iJ||x||.  (2.24) 

Taking  the  maximum  among  the  right  hand  side  terms  of  the  inequalities  (2.23)  and 
(2.24),  the  result  follows.  □ 

Remark  2.3.3.  If  the  stochastic  matrices  A;  are  generated  using  a  Laplacian  based 
scheme,  e.g. 

Ai  =  I-s£i,Vi, 


22 


where  X/  is  the  Laplacian  of  the  graph  G;  and  s  <  j-,  then  it  turns  out  that  A  >  0.  Hence, 
the  inequality  in  Assumption  2. 2. 3 -(c)  is  satisfied  if 

0  <  a  <  — , 

/  j 

which  is  a  sufficient  condition  for  the  stability  of  (2.5).  In  the  case  of  the  randomized  and 
broadcast  gossip  protocols  it  can  be  checked  that  A  =  0. 

Remark  2.3.4.  Throughout  the  rest  of  the  chapter  <p  should  be  interpreted  in  the  context 
of  the  assumptions  used,  i.e.  under  Assumption  2.2.2,  <p  is  the  uniform  bound  on  the 
subgradients  of  fix),  while  under  Assumption  2.2.3,  <p  is  the  bound  on  the  gradients 
Vfiixf  k))  and  V ffxfk))  given  by  Lemma  2.3.1. 


The  following  lemma  gives  upper  bounds  on  the  first  and  the  second  moments  of 
the  distance  between  the  estimate  xfik)  and  the  average  of  the  estimates,  x(k). 


Lemma  2.3.2.  Under  Assumptions  2.2.1  and  2.2.2  or  2.2.1  and  2.2.3,  for  the  sequences 
{xi(k)}k>o,  1  =  1  generated  by  (2.5)  with  a  constant  stepsize  a,  the  following  in¬ 

equalities  hold 


E[\\xi(k)-x(k)\\\<fi^Nrl^  +  axp  Vfv 


m 


1  -q 


(2.25) 


E[\\xi(k)-x(k)\\z]  <  N/32p\-™\  +Nazipz\l+2 


.2,2 


in  \  m 


\-qjl-p 


+  2Na/3(pm 


1 4=1  l+i  1 4=1  l+i 

pi  m  J  —  rjl  m  ] 

p-q 


(2.26) 


where  q,  p  and  m  are  defined  in  Remark  2.3.2. 


Proof.  Note  that  the  norm  of  the  deviation  zfik)  =  xfik)  -  x(k)  is  upper  bounded  by  the 
norm  of  the  aggregate  vector  of  deviations  z (k)  (with  probability  one),  i.e.  ||z,(fc)||  <  ||z(fc)||. 
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Using  Proposition  2.3.1,  it  follows  that  Z (k)  satisfies  the  following  dynamic  equa¬ 
tion 

Z  (k  +  1)  =  W  (k)Z(k)W  (k)'  +  F  (k),  (2.27) 

where  F (k)  is  given  by 

F(k)  =  a2(i-j)d(k)d(kya-jy-aw(k)z(k)d(ky(i-jy-a(i-j)d(k)z(kyw(ky. 

(2.28) 
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The  solution  of  (2.27)  is  given  by 


k- 1 

Z(k)  =  ®(U0)Z(0)®(U0)'  +  Yj  *  +  1)F '(s)O(ik,  5  +  1)'.  (2.29) 

,v=0 


For  simplicity,  in  what  follows,  we  will  omit  the  matrix  I  -  J  from  F (k)  since  it  disappears 
by  multiplication  with  the  transition  matrix  (see  Proposition  2.3.1).  We  can  further  write 


k- 1 

l|Z(fc)H  <  ||®(M)||2||Z(0)||  +  Y  s  +  1)' 

and  by  noting  that  ||Z(7r)||  =  ||z(fc)||2,  we  obtain 


k-i 

Emm2]  <  £[ii®(fc,o)ii2]iiz(0)n2 + J]  E[\mk,  * + s + inn.  (2.30) 

5=0 


From  (2.19)  of  Proposition  2.3.2  we  obtain 

F[||®(^,0)||2]<p^J. 


We  now  focus  on  the  terms  of  the  sum  in  the  right  hand-side  of  (2.30).  We  have 

®(fc,  s  +  l)F(,s)®(fc,  s  +  1)'  =  ar(&(k,  s  +  l)d(5)d(5)/®(fc,  s  +  1)'- 

-a<&(k,  s  +  l)W(v)z(5)d(5)/®(l,  s  +  1)'  -  a(E>(k,  s  +  l)d(5)z(5'),W(5'),®(fc,  s  +  l)r. 
Using  the  solution  of  z (k)  given  in  (2. 1 1),  we  get 


®(fc,  5  +  l)W(s)z(s)d(s)'®(fc,  s  +  Y)'  — 

5-1 


=  ®(Uv+  1)W(5) 


®(s,  0)z(0)  -  a  Y  ®(5, r  +  1  )d(r) 


r=0 


d(s)'®(fc,s  +  l)' 


5-1 


=  ®(U0)z(0)d(5)/®(A:,  s  +  l)r  -cr^®(fc,r  +  l)d(r)d(s)'®(fc,  s-l-  1)'. 

r= 0 

Similarly, 

®(U  v  +  l)d(5)z(5),W(5)/®(U  s  +  1)'  = 


(2.31) 
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(2.32) 


5-1 

®(fc,v  +  l)d(s)z(0y®(fc,0)r -ar^®(fc,  s  +  l)d(s,)d(r)'®(A:,r  +  1)'. 

r= 0 

We  now  give  a  more  explicit  formula  for  © (k,  s  +  l)F(.v)®(fc,  s  +  1)': 

©(£:,  s  +  l)F(s)®(£:,  s  +  1)'  =  a  20(k,  s  +  l)d(5')d(5),®(fc,  s  +  l)r- 

5-1 

-a®(fc,0)z(0)d(5'),®(/:,  s  +  1  )  +  a2^0(k,r  +  l)d(r)d(s)'0(k,s  +  l)r- 

r= 0 
,v-l 

-aO(k,  s  +  l)d(5)z(0),®(fc,0)/  +  0'2^®(fc,  s+  l)d(s)d(r)'©(fc,r  +  1)'. 

r= 0 

By  applying  the  norm  operator,  we  get 


||©(£,  5  +  1)F (s)O(ik,  5  +  1)'||  <  Na2<p2\\d>(k,  s  +  1)||2+ 


s—  I  ,v—  I 

+Na2ip2  ^  ||©(fc,r  +  l)®(fc,  v  +  1)'||  +Na2ip 2  ^  ||®(fc,  s  +  \)d>(k,r  +  1)'||+ 

r= 0  r=0 

+Na/3ip\mk,s  +  l)O(yt,0)'||  +NaPcp\mk,0)Q>(k,s  +  1)'||, 


||©(Jfc,  5  +  l)F(s)®(fc,  5  +  1)'||  <  Na2<p2\mk,  s  +  1)||2+ 

s—  I 

+2Na2ip2  ||®(fc,  r  +  l)®(fc,  5  +  1)'||  +  2Nap<p\\Q>(k,  s  +  l)®(fc, 0)'||.  (2.33) 

r= 0 

Next  we  derive  bounds  for  the  expected  values  of  each  of  the  terms  in  (2.33).  Based  on 


the  results  of  Proposition  2.3.2  we  can  write 


E[\mk,s+i)\\2]<p^l 


yijF[|l®(fc,r+  !)©(£,  s  +  l)r||]  <  J^L  m  J  <  mp[  m  ]  ^  if  < 

r= 0  r= 0  r=0 

I  k-s- 1  I  1  —  ?;UJ  +  1  I  k-s-l  I  1 

<  mpl  m  J - <  mpi  m  J - 

1  —TJ  1  —  J] 
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and 


E[\\q>(k,  s  +  l)O(/t,0)'||]  < 


Therefore  we  obtain 


£[||®(U  5  +  1)F (s)O(U  5  +  1)'||]  <  No1  ip2  ( 1  +  p[^\  +  INapypl^r1^^ . 

1  77  / 


We  know  compute  an  upper  bound  for  2^'=(!).£’[||®(£:,,s'  +  l)F(5)®(k,  5  +  1)'||] .  Using  the 


fact  that 


..  -  IVJ  ,  14=11+1  t 

Z|  k-s-l  I  1  —  pL  J  1 

pi  in  J  <  m  2_.  P‘  ^  111 - j - <  m- — 


s=0 


s= 0 


and 


k- 1 


k- 1 


Z|4=£=i|  |£±1|  I  k~s~l  I  I  ±  I 

pi  m  ijjl  m  J  <  \  pL  '»  JpL»,J  < 
.S-0 


,V=0 

[¥J 

<  m  £  pi^J 

.s-0 


14=11+1  |4=!|+i 

pi  i  —  jjl  in  1 

'  7]  =  m - . 

p-rj 


we  obtain 


k- 1 


Zl  9. 7T7  \  ^ 

£[||®(*,s+l)F(s)®(*,s+l)'||]  <AfaV  1  +  - -  - - + 

v 


+2Na(5(pm 


I  —  l+i  I  —  l+i 

pL  m  JT1  _  i"  J 
p-71 


Finally  we  obtain  an  upper  bound  for  the  second  moment  of  ||z(k)||: 


.  |4=i|+i  U-i  1 1 1 

n  Till  ill  2m  \  m  p  1  m  J  —  ip  in  i 

£[||z(k)||  ]  ^  N/32pl"’i  +  Na2ip~  1  4 - - h  2Naf3(pm— - . 

1-77/1-p  p~T] 


□ 


The  following  lemma  allows  us  to  interpret  dj(k)  as  an  e-subgradient  of  J)  at  x(k) 


(with  e  being  a  random  variable). 
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Lemma  2.3.3.  Let  Assumptions  2.2.2  or  2.2.3  hold.  Then  the  vector  dfk)  is  an  eik)- 

subdifferential  of  f  at  xik),  i.e.  dfk)  e  de(k)fi(x(k))  and  h(k )  =  Xf=i  dfk)  is  an  Ne(k)- 

subdifferential  off  at  xik),  i.e.  h{k)  e  d^e(k)f{x(k)),  for  any  k  >  0,  where 

_k-  i 

e(k)  =  2 (pp  Vwj|O(fc,0)||  +  2 cap2  *  +  1)11-  (2.34) 

Proof.  The  proof  is  somewhat  similar  to  the  proof  of  Lemma  3.4.5  of  [19].  Let  dfk)  be  a 
sub  gradient  of  f  at  x(k).  By  the  sub  gradient  definition  we  have  that 

Mxi(k ))  >  f(x(k))  +  di(k)\xi(k)  -  xik))  >  f(x(k))  -  \\di(k)\\\\(xi(k)  -  .v(())||, 


or 

Mxfk ))  >  f(x(k))  -  (fi\\zi(k)\\. 

Furthermore,  for  any  y  e  R"  we  have  that 

fi(y)  >  fiixi(k))  +  di{k)'{y  -  Xi(k))  =  ffxfk))  +  dfkfiy  -  xik))  +  di{k)'{x{k)  -  xfk))  > 

>  fiixik))  +  di(k)'(y  -  x(k))  -  2ip\\ziik)\\  >  fiix(k))  +  dfkfiy  -  x(k))  -  2y>||z(fc)||, 
or 

fi(y)  >  fiixik))  +  di(k)'(y  -  x(k))  -  e(k), 

where  e{k)  -  2ip\\z(k)\\.  Using  the  definition  of  the  e-subgradient,  it  follows  that  dfk)  e 
b ah) fiixik)).  Summing  over  all  i  we  get  that  ^=l  dfk)  e  d^gd^  fixik)).  Note,  that  eik)  has 
a  random  characteristic  due  to  the  assumptions  on  A (k).  □ 

For  twice  differentiable  cost  functions  with  lower  and  upper  bounded  hessians,  the 
next  result  gives  an  upper  bound  on  the  second  moment  of  the  distance  between  the  aver¬ 
age  vector  xik)  and  the  minimizer  of  /. 
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Lemma  2.3.4.  Let  Assumptions  2.2.1  and  2.2.3  hold  and  let  {x(k))k> o  be  a  sequence  of 


vectors  defined  by  iteration  (2.7).  Then,  the  following  inequality  holds 


r’rn  -/7  \  *„2i  ^  ii  -sr\\  *„2  k  .  4a<PP  ^  /  “  Tlm  , 

E[\\x(k)-x  ||  ]  <  ||jc(0)  — jc  ||  y  + - -  + 


V l 


y-rp 


l-y 


4  V^V 


m 


1  —q 


+  1  ,  (2.35) 


where  y  =  1  -  ad,  with  l  =  min;7;  and  q  is  defined  in  Remark  2.3.2. 


Proof.  Under  Assumption  2.2.3,  f(x)  is  a  strongly  convex  function  with  constant  Nl, 
where  /  =  min,  /,  and  therefore  it  follows  that 

Nl 

m-r>—\\x-x*w2.  (2.36) 

We  use  the  same  idea  as  in  the  proof  of  Proposition  2.4  in  [30],  formulated  under  a 
deterministic  setup.  By  (2.7),  where  we  use  a  constant  stepsize  a,  we  obtain 

\\x(k  +  1)  -  U||2  =  \\x(k)  -x*-  jjh(k)\\2  =  \\x(k)  ~  -**ll2  -  2^h(k)'(x(k)  -  x*)  +  a2 ip2. 

Using  the  fact  that,  by  Lemma  2.3.3,  h(k)  is  a  Ae(fc)-subdifferential  of  /  at  x(k),  we  have 


f(x*)  >  f  (x(k))  +  h(k)'(x*  -  x(k))  -  Ne(k), 


or,  from  inequality  (2.36), 


Further,  we  can  write 


|| x(k  +  1)  -  U||2  <  (1  -  al)  p(fc)  -  U||2  +  2 ae(k)  +  a2if2 


or 


k-i 

E[\\x(k)-x*\\2]  <  (1  -ff/)*ll*(0)-JC*||2  +  2](l  -al)k-s-\2aE[e(s)]  +a2tp2). 

5—0 
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Note  that  from  Assumption  2.2.3-(c),  0  <  a  <  j  and  therefore  the  quantity  yk  =  (1  -  alf 
does  not  grow  unbounded.  It  follows  that 


k-\ 


£[|TO-.r1|2]<yI]|.r(0)-.r1|2  +  ^y^-k2a£[6(5)]+aV).  (2-37) 


s=0 


From  the  expression  of  e(k)  in  Lemma  2.3.3,  we  immediately  obtain  the  following 
inequality 


£[,(,)]  + 

1-7/ 


The  inequality 


k- 1 


jfc-i 

jjLmJ  <  1  Tj~  '  ^ 

5—0 


5—0 


<  1  ^ 


v  r  / 


(77) 


-17  “7* 


7-7' 


yields 


fc-i 


E[e(s)\  < 


2(p/3^N  yk  —  rjm  2a<p2yfNm  1 


5=0 


77 


7-7" 


1 


I-77  1-7 


(2.38) 


(2.39) 


which  combined  with  (2.37),  generates  the  inequality  (2.35). 


□ 


2.4  Main  Results  -  Error  bounds 

In  the  following  we  provide  upper  bounds  for  two  performance  metrics  of  the  CB- 
MASM.  First,  we  give  a  bound  on  the  difference  between  the  best  recorded  value  of  the 
cost  function  /,  evaluated  at  the  estimate  Xi(k),  and  the  optimal  value  f*.  Second,  we 
focus  on  the  second  moment  of  the  distance  between  the  estimate  ; q(k)  and  the  minimizer 
of  f*.  For  a  particular  class  of  twice  differentiable  functions,  we  give  an  upper  bound 
on  this  metric  and  show  how  fast  the  time  varying  part  of  this  bound  converge  to  zero. 
The  bounds  we  give  in  these  section  emphasize  the  effect  of  the  random  topology  on  the 
performance  metrics. 
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The  following  result  shows  how  close  the  cost  function  /  evaluated  at  the  estimate 
x i(k)  gets  to  the  optimal  value  /*.  A  similar  result  for  the  standard  sub-gradient  method 
can  be  found  in  [31],  for  example. 

Corollary  2.4.1.  Let  Assumptions  2.2. 1  and  2.2.2  or  2.2.1  and  2.2.3  hold  and  let  {xj(k))k>  o 
be  a  sequence  generated  by  the  iteration  (2.5),  i  =  1, . . .  N.  Let  f^es\k)  =  minJ=o.../t£'[/(x;(5))] 
be  the  smallest  cost  value  ( in  average )  achieved  by  agent  i  at  iteration  k.  Then 

lim  f-est(k)  <f+  3 acp2N  V/V-^-  +  (2.40) 

k-><x>  1-7/2 

Proof.  Using  the  subgradient  definition  of  /;  at  xfk)  we  have  that 

ffxfk))  <  fi(x(k))  +  <p\\zi(k)\\,  for  all  i  =l,...,N. 

Summing  over  all  i,  we  get 

f(xi(k))  <  f(x(k))  +  N<p\\z(k)\\, 

which  holds  with  probability  one.  Subtracting  f*  from  both  sides  of  the  above  inequality, 
and  applying  the  expectation  operator,  we  further  get 

E[f(xi(m-r  <  E[mm-r+N<pE[\\zm\, 

or 

f-es,(k)-f  <  min  {E[ms))]-f+N<pE[\\z(s)\\}.  (2.41) 

s=0...k 

Let  .v*  e  X*  be  an  optimal  point  of  /.  By  (2.7),  where  we  use  a  constant  stepsize  a, 
we  obtain 

||.*0t+  1)-JC*||2  =  \\x(k)-x*  -  ^ h(k)\\ 2  <  \\x(k)-x*\\2-2^h(k)\x(k)-x*)  +  a2<p2 
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and  since,  by  Lemma  2.3.3,  h(x(k ))  is  a  7Ve(L)-subdifferential  of  /  at  x(k),  we  have 


\m +  d  -  **u2  <  ii  m  -  y  i2  -  -jj-(fm))  -n+ 2 + «y , 


or 

2  fc-l  fc-l 

p(/r)  -**l|2  <  ||^(0) -.v*||2  -  ^(/(x(s)) -/*)  +  2a  ^ 6(5)  +  ka2ip2. 

V  5=0  5=0 

Since  ||x(fc)-.v*||2  >  0 


2a 

~n 


/t-i 


/t— i 


^(/(.v(s))-/*)  <  |W0)-xl2  +  2a^6(,)  +  teV, 

5=0  5=0 


or 


fc-i 


J](E[f(x(S))]  -  /* )  <  y-||x(0)  -  U||2  +  N  E[e(s)] 


fc-i 


2a 


+ 


kNatp2 


5= 0  5=0 

Adding  and  subtracting  A</p£'[||z(5)||]  inside  the  sum  of  the  left-hand  side  of  the  above 


inequality  and  recalling  from  Lemma  2.3.3  that  e(k)  =  2ip\\z{k)\\,  we  obtain 


k  1  i  oat  k  1 

£  (E[f(x(s))]-f  +  Aty>E[||z(s)||])  <  —  ||3c(0)  -,v*||2  +  —  ^£[6(5)] 

5=0  ^  5=0 


+ 


kNaip 2 


Using  the  fact  that 


k- 1 

V (E[f(x(s))]-f  +  Aty>E[||z(s)||])  >  k  min  {E[f(x(s))]-f  +  A^E[||z(s)||]}, 

L — *  s=Q,...,k- 1 

s=0 


we  get 


min  {E[f(x(s))]-f* +NipE[\\z(s)\\]}  <  -^-|U(0) -.U||2  +  ^  V  £[e(s)]  + 

Using  inequality  (2.38)  from  Lemma  2.3.3  we  obtain 

jt-t 

V  E[e(s)]<2ifi/3^-^—+k2aip2  ViV-^-. 

1  —  77  1  —  77 


Aa</r 


32 


It  follows  that 


min  {E[f(x(s))]-f  +AV£[||z(s)||]}  <  —  ||jc(0)-.x:*||2+ 

s=0,...,k-l  2  ak 


[w  <N-p-  + Zzt. 

2k  \  1  —  77  1  —  77  /  2 


(2.42) 


Combining  inequalities  (2.41)  and  (2.42)  and  taking  the  limit,  we  obtain 


lim  <r+  3 aV2N 

k— »oo  1  —  1)  2 


□ 


In  the  case  of  twice  differentiable  functions,  the  next  result  introduces  an  error 
bound  which  essentially  says  that  the  estimates  “converge  in  the  mean  square  sense  to 
within  some  guaranteed  distance”  from  the  optimal  point,  distance  which  can  be  made 
arbitrarily  small  by  an  appropriate  choice  of  the  stepsize.  In  addition,  the  time  varying 
component  of  the  error  bound  converges  to  zero  at  least  linearly. 

Corollary  2.4.2.  Let  Assumptions  2.2.1  and  2.2.3  hold.  Then,  for  the  sequence  {.v/(k)k->o 
generated  by  iteration  (2.5)  we  have 


(a) 


lim  sup  E[\\Xi(k)  -  x* ||2]  <  Cx  +  C2  +  2  y/CiC2,  (2.43) 

k— >00 


where 


2  2 

Ci  =  2L£- 

l  1  —y 


4  m  ffi 
I-77 


+  1 


C2=JVorV(  l  +  ftfe 


(2.44) 


(b) 


E[\\xi(k)-x*\\2]<m  +  C, 


(2.45) 
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where  if/(k)  =  c8k  with  c  a  positive  constant  depending  on  the  initial  conditions,  8  = 
ma x{y,T]m},  y  =  1  -al,  and  where  C  =  4max{Ci,C2}. 

Proof.  By  the  triangle  inequality  we  have 

II Xi(k)  - .r*||2  <  || Xi(k)  -  x(k) ||2  +  2|| Xi(k)  -  x(k)\\\\x(k)  -  x*\\  + 1| x{k)  -  **||2. 


or 


E[\\xi(k)  -  x*\\2]  <  E\\\xi(k)  -  x(k)\\2]  +  2E[\\Xi(k)  -  x(k)\\Mk)  ~  **ll]  +  E[\\x(k)  -  x*\\2] . 


By  the  Cauchy-Schwarz  inequality  for  the  expectation  operator,  we  get 


E[\\Xi(k)  -  U||2]  <  E[\\xi(k)  -  x(k)\\2]  +  2E\\\Xi(k)  -  x(k)\\2] ^E[\\x(k)  -  **||2]: 3  +  £[»)-x*||2]. 

(2.46) 


Inequality  (2.35)  can  be  further  upper  bounded  by 


E[\\x(k)-x*\\2]<fi(k)  +  Cu 


where 


<P\(k)  = 


\\-<m  *,,2  1 

||jc(0)-jc  ||  +■ 


V 7 


y-i r 


8k  =  Cl  8k, 


Cl 


with  8  =  max{y,^m}  and  C\  being  given  in  (2.44).  Using  the  inequalities 


__L  k_ 

<p  i" pm  and 


IL-il+i  _i  1 
1) L  m  J  <  1]  mjlm  f 


from  (2.26),  a  new  bound  for  E[\\xi(k)  -  x(k) ||2]  is  given  by 

E[\\xt(k)  -  x(k)\\2]  <  faik)  +  C2, 
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where  C 2  is  given  in  (2.44)  and 


<p2(k) 


N/32p~ 


+ 


2Na/3<pm  (  _i 


//-p 


77  «'  +p 


=  c2<5*. 


C2 


Taking  the  limit  of  (2.46)  and  recalling  that  under  Assumptions  2.2.1  and  2.2.3, 
y  <  1  and  //“  <  1  for  any  m  e  Im ,  we  obtain  (2.43). 

Inequality  (2.46)  can  be  further  upper  bounded  by 


E[\\xi(k)-x*\\ 2]  <  2max{ci,C2}dA'  +  2^max{ci,C2}d^  +  max{Ci,C2})  =  i//(k)  +  C, 

where  if/(k)  =  c6k,  with  c  =  4max{ci,C2}  and  C  =  4max{Ci,C2}.  Hence,  we  obtained  that 
the  time  varying  component  of  the  error  bound  converges  linearly  to  zero  with  a  factor 
6  =  ma  x{y,7/m}. 

□ 


2.4.1  Discussion  of  the  results 

We  obtained  upper  bounds  on  two  performance  metrics  relevant  to  the  CBMASM. 
First  we  studied  the  difference  between  the  cost  function  evaluated  at  the  estimate  and 
the  optimal  solution  (Corollary  2.4.1)  -  for  non-differentiable  and  differentiable  functions 
with  bounded  (sub)gradients.  Second,  for  a  particular  class  of  convex  functions  (see 
Assumptions  2.2.3),  we  gave  an  upper  bound  for  the  second  moment  of  the  distance 
between  the  estimates  of  the  agents  and  the  minimizer.  We  also  showed  that  the  time 
varying  component  of  this  upper  bound  converges  linearly  to  zero  with  a  factor  reflecting 
the  contribution  of  the  random  topology.  We  introduced  Assumption  2.2.3  to  cover  part 
of  the  class  of  convex  functions  for  which  uniform  boundness  of  the  (sub)gradients  can 
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not  be  guaranteed. 

From  our  results  we  can  notice  that  the  stepsize  has  a  similar  influence  as  in  the 
case  of  the  standard  subgradient  method,  i.e.  a  small  value  of  a  implies  good  precision 
but  slow  rate  of  convergence,  while  a  larger  value  of  a  increases  the  rate  of  convergence 
but  at  a  cost  in  accuracy.  More  importantly,  we  can  emphasize  the  influence  of  the  consen¬ 
sus  step  on  the  performance  of  the  distributed  algorithm.  When  possible,  by  appropriately 
designing  the  probability  distribution  of  the  random  graph  (together  with  an  appropriate 
choice  of  the  integer  m)  we  can  improve  the  guaranteed  precision  of  the  algorithm  (in¬ 
tuitively,  this  means  making  the  quantities  m/(  1  -rf)  and  m/(  1  - p)  as  small  as  possible). 
In  addition,  the  rate  of  convergence  of  the  time  varying  component  of  the  error  bound 
(2.45)  can  be  improved  by  making  the  quantity  r/m  as  small  as  possible.  Note  however 
that  there  are  limits  with  respect  to  the  positive  effect  of  the  consensus  step  on  the  the  rate 
of  convergence  of  if/(k),  since  the  latter  is  determined  by  the  maximum  between  y  and  rjm . 
Indeed,  if  the  stepsize  is  small  enough,  i.e. 

1  i 

a<-(l-Tjm),  (2.47) 

then  the  rate  of  convergence  of  if/(k)  is  given  by  y.  This  suggests  that  having  a  fast  con¬ 
sensus  step  will  not  necessarily  be  helpful  in  the  case  of  a  small  stepsize,  which  is  in 
accordance  with  the  intuition  on  the  role  of  a  small  value  of  a.  In  the  case  inequality 
(2.47)  is  not  satisfied,  the  rate  of  convergence  of  if/(k)  is  determined  by  i]~> .  However,  this 
does  not  necessarily  means  that  the  estimates  will  not  “converge  faster  to  within  some 
distance  of  the  minimizer”,  since  we  are  providing  only  an  error  bound. 

Assume  that  we  are  using  the  centralized  subgradient  method  to  minimize  the  con- 
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vex  function  f(x)  =  Xf=i  fi(x)  satisfying  Assumption  2.2.2  (the  subgradients  of  fi(x)  are 
uniformly  bounded  by  p),  where  the  stepsize  used  is  N  times  smaller  than  the  stepsize  of 
the  distributed  algorithm,  i.e. 


a 


x(k  +  1)  =  x(k)  -  —d(k). 


where  d{k)  is  a  subgradient  of  /  at  x(k),  with  \\d(k)\\  <  Nip.  Then,  from  the  optimization 
literature  we  get 


lim  fbest(k)  <f  + 

k—>oo 


Nap2 

2 


where  fbest(k )  =  minv=o  /,  /'(,v'(.v)).  From  above  we  note  that,  compared  with  the  central¬ 
ized  subgradient  method  with  a  step  size  N  times  smaller  than  the  agents’  stepsize,  the 
distributed  optimization  algorithm  introduced  an  additional  term  in  the  error  bound  given 
by  3ap2N  x[N  j which  reflects  the  influence  of  the  dimension  of  the  network  and  of  the 
random  topology  on  the  guaranteed  accuracy  of  the  algorithm. 

Let  us  now  assume  that  we  are  minimizing  the  function  f(x),  satisfying  Assump¬ 
tions  2.2.3-(a)(b),  using  a  centralized  gradient  algorithm: 

x{k  +  1)  =  x(k)  -  ^Vf(x(k)), 


where  we  have  that  a  is  small  enough  (0  <  a  <  f  )  so  that  the  algorithm  is  stable  and  there 
exit  pc  so  that  ||Vy/(.r(k))||  <  pc.  It  follows  that  we  can  get  the  following  upper  bound  on 

the  distance  between  the  estimate  of  the  optimal  decision  vector  and  the  minimizer 

2 

\\x(k)-x*\\2  <\\x(0)-x*\\2ykc  +  ^. 


with  yc  =  1  -al.  Therefore,  we  can  see  that  y  =  yc  which  shows  that  the  rates  of  conver¬ 
gence,  at  which  the  time- varying  components  of  the  error  bounds  converge  to  zero  in  the 
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centralized  and  distributed  cases,  are  the  same.  However,  please  note  that  we  assumed  the 
stepzise  in  the  centralized  case  to  be  N  times  smaller  than  the  stepsize  used  by  the  agents. 

The  error  bounds  (2.40)  and  (2.45)  are  functions  of  three  quantities  induced  by  the 
consensus  step:  jz^  and  rjm.  These  quantities  show  the  dependence  of  the  perfor¬ 

mance  metrics  on  the  pmf  of  G(k)  and  on  the  corresponding  random  matrix  Aik).  The 
scalars  rj  and  p  represent  the  first  and  second  moments  of  the  SLEM  of  the  random  ma¬ 
trix  A(k  +  1) . . .  A(k  +  m ),  corresponding  to  a  random  graph  formed  over  a  time  interval  of 
length  m,  respectively.  We  notice  from  our  results  that  the  performance  of  the  CBMASM 
is  improved  by  making  jz^  and  rjm  as  small  as  possible,  i.e.  by  optimizing  these 
quantities  having  as  decision  variables  m  and  the  pmf  of  Gik).  For  instance  if  we  are 
interested  in  obtaining  a  tight  bound  on  E[\\xi(k )  -  x*\\2]  and  having  a  fast  decrease  to  zero 
of  i j/(k),  we  can  formulate  the  following  multi-criteria  optimization  problem: 

min m,Pi  [n”,C\  +C2  +  2  ^C\C2} 

subject  to:  m>  1, 

(2.48) 


rjm  >  y; 


X?Pi=UPi>0. 

where  C\  and  C2  were  defined  in  (2.44).  The  second  inequality  constraint  was  added  to 
emphasize  the  fact  that  making  //™  too  small  is  pointless,  since  that  rate  of  convergence 
of  *p(k)  is  limited  by  y.  If  we  are  simultaneously  interested  in  tightening  the  upper  bounds 
of  both  metrics,  we  can  introduce  the  quantity  jz^  in  the  optimization  problem  since 


and  jz^  are  not  necessarily  minimized  by  the  same  probability  distribution.  The  solution 


to  the  above  problem  is  a  set  of  Pareto  points,  i.e.  solution  points  for  which  improvement 
in  one  objective  can  only  occur  with  the  worsening  of  at  least  one  other  objective. 
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We  note  that  for  each  fixed  value  of  m,  the  three  quantities  are  minimized  if  the 
scalars  r\  and  p  are  minimized  as  functions  of  the  pmf  of  the  random  graph.  An  approxi¬ 
mate  solution  of  (2.48)  can  be  obtained  by  focusing  only  on  minimizing  p^,  since  both 
r/~  and  are  upper  bounded  by  this  quantity.  Therefore,  an  approximate  solution  can 
be  obtained  by  minimizing  77  (i.e.  computing  the  optimal  pmf)  for  each  value  of  m,  and 
then  picking  the  best  value  m  with  the  corresponding  77  that  minimizes  -p^.  Depending  on 
the  communication  model  used,  the  pmf  of  the  random  graph  can  be  a  quantity  dependent 
on  a  set  of  parameters  of  the  communication  protocol  (transmission  power,  probability  of 
collisions,  etc)  and  therefore  we  can  potentially  tune  these  parameters  so  that  the  perfor¬ 
mance  of  the  CBMASM  is  improved. 

In  what  follows  we  provide  a  simple  example  where  we  show  how  77,  the  optimal 
probability  distribution,  -p^  and  77™  evolve  as  functions  of  m. 

Example  2.4.1.  Let  G(k )  be  a  random  graph  process  taking  values  in  the  set  Q  =  {G\,  G2},with 
probability  p  and  1  -p,  respectively.  The  graphs  G 1  and  G2  are  shown  in  Figure  2.1.  Also, 
let  A(k)  be  a  (stochastic)  random  matrix  ,  corresponding  to  G(k),  taking  value  in  the  set 
1R  =  {Ai,A2},  with 
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Figure  2.2(a)  shows  the  optimal  probability  p*  that  minimizes  rj  for  different  values 
ofm.  Figure  2.2(b)  shows  the  optimized  77  (computed  at  p*)  as  a  function  ofm.  Figures 
2.2(c)  and  2.2(d)  show  the  evolution  of  the  optimized  -p^  and  )]_>  as  functions  of  m,  from 
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Figure  2.1:  The  sample  space  of  the  random  graph  G (k) 
where  we  notice  that  a  Pareto  solution  is  obtained  for  m  =  5  and  p*  =  0.582. 

In  order  to  obtain  the  solution  of  problem  (2.48),  we  need  to  compute  the  probability 
of  all  possible  sequences  of  length  m  produced  by  G(k),  together  with  the  SLEM  of  their 
corresponding  stochastic  matrices.  This  task,  for  large  values  of  m  and  M  may  prove  to 
be  numerically  expensive.  We  can  somewhat  simplify  the  computational  burden  by  using 
the  bounds  on  7]  and  p  introduced  in  (2.16)  and  (2.17),  respectively.  Note  that  every  result 
concerning  the  performance  metrics  still  holds.  In  this  case,  for  each  value  of  m,  the  upper 
bound  on  tj  is  minimized,  when  pm  is  maximized,  which  can  be  interpreted  as  having  to 
choose  a  pmf  that  maximizes  the  probability  of  connectivity  of  the  union  of  random  graph 
obtained  over  a  time  interval  of  length  m. 

Even  in  the  case  where  we  use  the  bound  on  77,  it  may  be  very  difficult  to  compute 
the  expression  for  pm,  for  large  values  of  m  (the  set  Q  may  allow  for  a  large  number 
of  possible  unions  of  graphs  that  produce  connected  graphs).  Another  way  to  simplify 
our  problem  even  more,  is  to  (intelligently)  fix  a  value  for  m  and  try  to  maximize  pm 
having  as  decision  variable  the  pmf.  We  note  that  m  should  be  chosen  such  that,  within 
a  time  interval  of  length  m,  a  connected  graph  can  be  obtained.  Also,  a  very  large  value 
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(a)  (b) 


(c) 


(d) 


Figure  2.2:  (a)  Optimal  p  as  a  function  of  m;  (b)  Optimized  77  as  a  function  of  m; 
Optimized  as  a  function  of  m;  (d)  Optimized  rjm  as  a  function  of  in. 
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for  m  should  be  avoided,  since  is  lower  bounded  by  m.  Although  in  general  the 
uniform  distribution  does  not  necessarily  minimize  77,  it  becomes  the  optimizer  under 
some  particular  assumptions,  stated  in  what  follows.  Let  Q  be  such  that  a  connected 
graph  can  be  obtained  only  over  a  time  interval  of  length  M  (i.e.  in  order  to  form  a 
connected  graph,  all  graphs  in  Q  must  appear  within  a  sequence  of  length  M).  Choose  M 
as  the  value  for  m.  It  follows  that  pm  can  be  expressed  as: 

M 

P  m=m'Y\Pi- 

i=  1 

We  can  immediately  observe  that  pOT  is  maximized  for  the  uniform  distribution,  i.e.  pt  = 
— ,  for  i  =  1 

2.5  Application  -  Distributed  System  Identification 

In  this  section  we  show  how  the  distributed  optimization  algorithm  analyzed  in  the 
previous  section  can  be  used  to  perform  collaborative  system  identification.  We  assume 
the  following  scenario:  a  group  of  sensors  track  an  object  by  taking  measurements  of 
its  position.  These  sensors  have  memory  and  computation  capabilities  and  are  organized 
in  a  communication  network  modeled  by  a  random  graph  process  G(k)  satisfying  the 
assumptions  introduced  in  Section  II.  The  task  of  the  sensors/agents  is  to  determine  a 
parametric  model  of  the  object’s  trajectory.  The  measurements  are  affected  by  noise, 
whose  effect  may  differ  from  sensor  to  sensor  (i.e.  some  sensors  take  more  accurate 
measurements  than  others).  This  can  happen  for  instance  when  some  sensors  are  closer  to 
the  object  than  other  (allowing  a  better  reading  of  the  position),  or  sensors  with  different 
precision  classes  are  used.  Determining  a  model  for  the  time  evolution  of  the  object’s 
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position  can  be  useful  in  motion  prediction  when  the  motion  dynamics  of  the  object  in 


unknown  to  the  sensors.  The  notations  used  in  the  following  are  independent  from  the 
ones  used  in  the  previous  sections. 

2.5.1  System  identification  model 

Let  p{t)'  =  [x(t),y(t),z(t)\  be  the  position  vector  of  the  tracked  object.  We  model  the 
time  evolution  of  each  of  the  axis  of  the  position  vector  as  a  time  dependent  polynomial 
of  degree  na,  i.e. 

x(l)  =  a.Q  +  axxt  + . . .  +  a*Jna, 

y(t )  =  <2q  +  a^t  + . . .  +  aynjna ,  (2.49) 

z(t)  =  a.Q-\-a)t  + . . .  -\-a^jna. 

The  measurements  of  each  sensor  i  are  given  by 

Xj(t)  =  xit)  +  eijt), 

yi(t)  =  y(t)  +  eUy(t),  (2.50) 

Ziit)  =  z(t)  +  ei,z(t), 

where  c,-  x(t),  eiy(t )  and  e-,  Jt)  are  assumed  white  noises  of  (unknown)  variances  cr?  ,  err 
and  cr^„  respectively.  Equivalently,  we  have 

Xi(t)  =  (fit)'6x  +  eUx(t ), 

yiit)  =  <p(t)'6y  +  ei,y(t),  (2.51) 

Ziit )  =  (fit)' 9Z  +  eijt), 

where  (pit)'  =  [l,f, . .  .,tn“]  and  6X  =  [a0,x, . .  .,an^x]',  6y  =  [a0,v,  •  •  •  ,ana,yY  and  6Z  =  [a0^z, . .  .,ana,z]'. 

In  the  following  we  focus  only  on  one  coordinate  of  the  position  vector,  say  xit). 

The  analysis,  however  can  be  mimicked  in  a  similar  way  for  the  other  two  coordinates.  Let 
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T  be  the  total  number  of  measurements  taken  by  the  sensors  and  consider  the  following 


quadratic  cost  functions 


T 

jl(0x)  =  J](xi(t)-cp(tyex)2,  w. 

t= i 


Using  its  own  measurements,  sensor  i  can  determine  a  parametric  model  for  the  time 
evolution  of  the  coordinate  x(t)  by  solving  the  optimization  problem: 


min  JiiOx).  (2.52) 

0X 

Let  X'  =  [.v/(1),...,jc;(L)]  be  the  vector  of  measurements  of  sensor  i  and  let  O'  =  [</>(l),  ...,<p(T)] 
be  the  matrix  formed  by  the  regression  vectors.  It  is  well  known  that  the  optimal  solution 
of  (2.52)  is  given  by 

hx  =  (®'®)_1  (2.53) 

Remark  2.5.1.  It  can  be  shown  that  O'O  is  invertible  for  any  T,  but  it  becomes  ill  con¬ 
ditioned  for  large  values  ofT.  That  is  why,  for  our  numerical  simulations,  we  will  in  fact 
use  an  orthogonal  basis  to  model  the  time  evolution  of  the  coordinates  x(t  ),  y{t),  and  z(t). 


Performing  a  localized  system  identification  does  not  take  into  account  the  mea¬ 
surements  of  the  other  sensors,  which  can  potentially  enhance  the  identified  model.  If  all 
the  measurements  are  centralized,  a  model  for  the  time  evolution  of  x(t)  can  be  computed 
by  solving 

min  J(0X), 

Ox 

where 

N 

J(0X)  =  J]ji(0x).  (2.54) 

/=  1 
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Note  that  (2.54)  fits  the  framework  of  the  distributed  optimization  problem  formulated  in 
the  previous  sections,  and  therefore  can  be  solved  distributively,  eliminating  the  need  for 
sharing  all  measurements  with  all  other  sensors. 

Remark  2.5.2.  If  each  sensor  has  a  priori  information  about  its  accuracy,  then  the  cost 
function  (2.54)  can  be  replaced  with 

N 

J(.ex)  =  YuhxJi(0x),  (2.55) 

i=  1 

where  6XX  is  a  positive  scalar  such  that  the  more  accurate  sensor  i  is,  the  larger  6,-  is.  The 
scalar  SLX  can  be  interpreted  as  trust  in  the  measurements  taken  by  sensor  i.  The  sensors 
can  use  local  identification  to  compute  6jtX.  For  instance,  dj  X  can  be  chosen  as  <5, >x  =  -rr, 

^~i,x 

where  cr2  is  given  by 

1  T 

o-\x  =  j  YjMt)  ~  ip(t)'eUx)2, 

t=  l 

where  0LX  is  the  local  estimate  of  the  model  for  the  time  evolution  of  x(t). 

The  distributed  optimization  algorithm  (2.5)  can  be  written  as 

N 

fjk  +  1)  =  ^  aij(k)6j,x(k)  -  aVJi(k),  (2.56) 

7=1 

where  VJ)-(k)  =  -20' (X,  -<bdpx(k)). 

2.5.2  Numerical  simulations 

In  this  section  we  simulate  the  distributed  system  identification  algorithm  under  two 
gossip  communication  protocols:  the  randomized  gossip  protocol  [7]  and  the  broadcast 
gossip  protocol  [1].  We  perform  the  simulations  on  a  circular  graph,  where  we  assume 
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that  the  cardinality  of  the  neighborhoods  of  the  nodes  is  two.  This  graph  is  a  particular 
example  of  small  world  graphs  [53]  (for  an  analysis  of  the  consensus  problem  under  small 
world  like  communication  topologies,  the  reader  can  consult  [3]  for  example). 


Figure  2.3:  Circular  graph  with  N  =  8 


In  the  case  of  the  randomized  gossip  protocol,  the  set  of  consensus  matrices  is  given 
by 

3?  -  {Ajj,i  =  1  ---NJ  6  {/-  l,i  +  1}}, 

where  A;J-  =  I  -  -  ej)(ei  -  e  f)'  and  where  by  convention  we  assume  that  if  i  =  N  then 

i  +  l  =  l  and  if  i  =  1  then  i  —  1  =  N.  We  assume  that  if  node  i  wakes  up,  it  chooses  with 
uniform  distribution  between  its  two  neighbors.  Hence  the  probability  distribution  of  the 
random  matrix  A(k)  is  given  by 


Pr(A(k)=Aij)  =  —. 


We  note  that  the  minimum  value  of  m  such  that  rjm  <  1  is  N  -  1.  Recall  that  m  is  the  length 
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of  a  time  interval  such  that  Pr(\J"l=Q  G(k  +  /)  j  >  0  for  any  k.  It  turns  out  that  for  m  =  N -l 


(N~2  )  /  t 

p’c  =  Pr  MG(*  +  /)  =  ATI  — 
v  1=0  ;  '  v 


Interestingly,  the  matrix  products  of  length  N  -  1  of  the  form  [])=!  ^i+i(hi+\+i0  with  /'o  e 
{0, . . .  ,N  —  1},  and  the  matrix  products  that  may  be  obtained  by  the  permutations  of  the 
matrices  in  the  aforementioned  matrix  products,  have  the  same  SLEM  (where  the  sum¬ 
mations  in  the  indices  are  seen  as  modulo  N).  In  fact  it  is  exactly  this  property  that  allows 
us  to  give  the  following  explicit  expression  for  tjn-\ 


nrN_i=prAr+\-prc,  (2.57) 

where  Ar  is  the  SLEM  of  the  matrix  product  A  u A93  •  •  •  A#- i,n- 

In  the  case  of  the  broadcast  gossip  protocol,  the  set  JA  is  given  by 

=  {Aui=l...N}, 

where  A,-  =  I  -  5  [(<?,•  -  ei+x )(e;-  -  ei+i)'  +  (et  -  e;_i)(c;-  -  e;_i)']  and  Pr(A{k )  =  A;)  =  jj.  For 
odd  values  of  N  (and  N  >  3),  the  minimum  value  of  m  such  that  r/m  <  1  is  given  by 
m  =  In  addition,  we  have  that 

' N-3 

pbc  =  Pr  [J  Gik  +  l) 

v=0 

Observing  a  similar  phenomenon  as  in  the  case  of  the  randomized  gossip  protocol,  namely 
that  the  matrix  products  Ai+,0A3+;0  . . .  A^_2+/0  for  io  €  {0, . . . N  -  1}  and  their  permutations 
have  the  same  SLEM  (where  as  before  the  summations  of  indices  are  seen  as  modulo  N), 
we  obtain  the  formula 

nV  1  =  pbAb +  i-pbc. 
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where  Ab  is  the  SLEM  of  the  matrix  product  A  i  A3  •  •  -An -2. 

The  values  for  rfN-l  and  rf’N  ,  computed  above,  in  the  case  of  the  two  gossip  pro- 

“2“ 

tocols,  do  not  necessarily  provide  tight  error  bounds,  since  we  considered  minimal  time 
interval  lengths  so  that  r\m  <  1.  Even  for  this  relatively  simple  type  of  graph,  analytical 
formulas  for  Tjm,  for  large  values  of  m,  are  more  difficult  to  obtain  due  to  an  increase  in 
combinatorial  complexity  and  because  different  matrix  products  that  appear  in  the  expres¬ 
sion  of  77  do  not  necessarily  have  the  same  SLEM.  However,  we  did  compute  numerical 
estimates  for  different  values  of  m.  Figures  4  and  5  show  estimates  of  the  three  quantities 
of  interest,  77,  and  77™,  as  functions  of  m,  for  N  =  1 1  (the  estimates  were  computed  by 
taking  averages  over  2000  realizations  and  are  shown  together  with  the  95%  confidence 
intervals).  We  can  see  that  is  minimized  for  m  ~  55  in  the  case  of  the  randomized 
gossip  protocol  and  for  m  ~  30  in  the  case  of  the  broadcast  gossip  protocol,  while  the 
best  achievable  77™  are  approximately  equal  for  the  two  protocols,  (i.e.  0.985.  for  the 
randomized  gossip  protocols  and  0.982  for  the  broadcast  gossip  protocols). 

Next  we  present  numerical  simulations  of  the  distributed  system  identification  al¬ 
gorithm  presented  in  the  previous  subsection,  under  the  randomized  and  broadcast  gossip 
protocols.  We  would  like  to  point  out  that,  in  order  to  maintain  numerical  stability,  in  our 
numerical  simulation  we  used  an  orthogonalized  version  of  O,  given  by  ©  =  <£>H,  where 
ffi’s  columns  form  an  orthogonal  basis  of  the  range  of  ©,  and  the  new  vector  of  the  param¬ 
eters  is  given  6  =  HO,  where  H  is  a  linear  transformation  matrix,  whose  entries  depend  on 
the  orthogonalization  process  used  (Gram-Schmidt,  Householder  transformations,  etc.). 
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Randomized  gossip  protocol,  N=1 1 


(a) 


Randomized  gossip  protocol,  N=1 1 


(b) 


Randomized  gossip  protocol,  N=1 1 


(C) 
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Figure  2.4:  Estimates  of  //,  and  //«  for  the  randomized  gossip  protocol  and  for  N  =  1 1 


Broadcast  gossip  protocol,  N=1 1 
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Broadcast  gossip  protocol,  N=1 1 
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Figure  2.5:  Estimates  of  q,  ^  and  //»>  for  the  broadcast  gossip  protocol  and  for  TV  =11 
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Broadcast  gossip  protocol,  N=1 1 
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Therefore,  the  cost  function  we  are  minimizing  can  be  rewritten  as 


n 

J(~ox)  =  J]jiCox), 

i=  1 

where  Ji(6x)  =  ||Xf  -  00*11“. 

It  is  easy  to  check  that  in  the  case  of  the  two  protocols,  A  (the  smallest  of  all  eigen¬ 
values  of  matrices  belonging  to  the  set  J[)  is  zero.  In  addition,  Assumption  2.2.3-(a)(b) 
are  satisfied  for  /,  =  L,  =  2,  and  for  a  <  \  the  distributed  optimization  algorithm  is  guar¬ 
anteed  to  be  stable  with  probability  one  (recall  Lemma  2.3.1).  From  above  we  see  that 
rjm  can  not  attain  less  than  0.98  for  both  protocols,  for  any  m.  Therefore,  although  we  can 
choose  a  >  0.01  which  in  turn  implies  y  <  0.98,  our  analysis  cannot  guarantee  a  rate  of 
convergence  for  if/(k)  smaller  than  0.98,  since  the  rate  of  convergence  is  upper  bounded 
by  the  maximum  between  y  and  r]~> .  However,  this  does  not  mean  that  faster  rates  of 
convergence  can  not  be  achieved,  which  in  fact  is  shown  in  our  numerical  simulations. 

Figures  6  and  7  present  numerical  simulations  of  the  distributed  system  identifica¬ 
tion  algorithm  for  the  two  protocols  and  for  a  circular  graph  with  N  =  1 1 .  In  our  numerical 
experiments  we  considered  a  number  T  =  786  of  measurements  of  the  .v-coordinate  of  the 
trajectory  depicted  in  Figure  2.6.  We  assumed  that  the  ^-coordinate  measurements  are 
affected  by  white,  Gaussian  noise  with  a  signal-to-noise  ration  given  by  S NRf  =  5  xi  dB , 
for  i  =  1 ...  11.  The  time  polynomials  modeling  the  trajectory  evolution  are  chosen  of 
degree  ten,  i.e.  na  =  10.  We  plot  estimates  of  two  metrics:  max,  E[\\f)LX(k)  -  6f)|]  and 
ma Xj  E\f(9i,x(k))\  -  f*  for  different  values  of  a  (the  estimates  were  computed  by  taking 
averages  over  500  realizations).  We  note  that  for  larger  values  of  a,  under  the  two  proto¬ 
cols,  the  algorithm  has  roughly  the  same  rate  of  convergence,  but  the  broadcast  protocol 
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-1  -0  4 


X 


Figure  2.6:  Trajectory  of  the  object 

is  more  accurate.  This  is  in  accordance  with  our  analysis,  since  as  Figures  4  and  5  show, 
for  any  ;//,  quantities  which  control  the  guaranteed  accuracy.  For  smaller 
values  of  cr,  under  both  protocols  the  algorithm  becomes  more  accurate  and  the  rate  of 
convergence  decreases  since  the  parameters  y  becomes  larger  and  therefore  dominant. 
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(a) 


a  =  0.01, N  =  11 


100 


(b) 


a  =  0.003,  N  =  11 


(c) 

Figure  2.7:  Estimate  of  max,  E\\\0jx{k)  -  (f  \\\  for  the  randomized  and  broadcast  protocol 
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gossip  protocols 


(a) 


Q  =  0.01,  iV  =  11 


(b) 


a  =  0.003,1V  =11 


(c) 

Figure  2.8:  Estimate  of  max,  E\  f(0i  x(k))\  -/*  for  the  randomized  and  broadcast  protocol 


gossip  protocols 
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Chapter  3 


Distributed  Asymptotic  Agreement  Problem  on  Convex  Metric  Spaces 
3.1  Introduction 

A  convex  metric  space  is  a  metric  space  endowed  with  a  convex  structure.  In  this 
chapter  we  generalize  the  asymptotic  consensus  problem  to  the  more  general  case  of 
convex  metric  spaces  and  emphasize  the  fundamental  role  of  convexity  and  in  particular 
of  the  convex  hull  of  a  finite  set  of  points.  Tsitsiklis  showed  in  [51]  that,  under  some 
minimal  connectivity  assumptions  on  the  communication  network,  if  an  agent  updates 
its  value  by  choosing  a  point  (in  R")  from  the  (interior)  of  the  convex  hull  of  its  current 
value  and  the  current  values  of  its  neighbors,  then  asymptotic  convergence  to  consensus 
is  achieved.  We  will  show  that  this  idea  extends  naturally  to  the  more  general  case  of 
convex  metric  spaces. 

Our  main  contributions  are  as  follows.  First ,  after  citing  relevant  results  concerning 
convex  metric  spaces,  we  study  the  properties  of  the  distance  between  two  points  belong¬ 
ing  to  two,  possibly  overlapping  convex  hulls  of  two  finite  sets  of  points.  These  properties 
will  prove  to  be  crucial  in  proving  the  convergence  of  the  agreement  algorithm.  Second, 
we  provide  a  dynamic  equation  for  an  upper  bound  of  the  vector  of  distances  between  the 
current  values  of  the  agents.  We  show  that  the  agents  asymptotically  reach  agreement, 
by  showing  that  this  upper  bound  asymptotically  converges  to  zero.  Third,  we  character¬ 
ize  the  agreement  point(s)  compared  to  the  initial  values  of  the  agents,  be  giving  upper 
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bounds  on  the  distance  between  the  agreement  point(s)  and  the  initial  values  in  terms  of 
the  distances  between  the  initial  values  of  the  agents.  Forth ,  we  emphasize  the  relevance 
of  our  framework,  by  providing  an  application  under  the  form  of  a  consensus  of  opinion 
algorithm.  For  this  example  we  define  a  particular  convex  metric  space  and  we  study  in 
more  depth  the  properties  of  the  convex  hull  of  a  finite  set  of  points. 

The  chapter  is  organized  as  follows.  Section  3.2  introduces  the  main  concepts  re¬ 
lated  to  the  convex  metric  spaces  and  focuses  in  particular  on  the  convex  hull  of  a  finite 
set.  Section  3.3  formulates  the  problem  and  states  our  main  theorem.  Section  3.4  gives 
the  proof  of  our  main  theorem  together  with  some  auxiliary  results.  In  Section  3.6  we 
present  an  application  of  our  main  result  by  providing  an  iterative  algorithm  for  reaching 
consensus  of  opinion. 

Some  basic  notations:  Given  W  e  E"x'''  by  [  IT],y  we  refer  to  the  (i,j)  element  of  the 
matrix.  The  underlying  graph  of  IT  is  a  graph  of  order  n  for  which  every  edge  corresponds 
to  a  non-zero,  non-diagonal  entry  of  IT.  We  will  denote  by  Ipp  the  indicator  function  of 
event  A.  Given  some  space  X  we  denote  by  V(X)  the  set  of  all  subsets  of  X. 

3.2  Convex  Metric  Spaces 

The  first  part  of  this  section  deals  with  a  set  of  definitions  and  basic  results  about 
convex  metric  spaces.  The  second  part  focuses  on  the  convex  hull  of  a  finite  set  in  convex 
metric  spaces. 
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3.2.1  Definitions  and  Results  on  Convex  Metric  Spaces 

For  more  details  about  the  following  definitions  and  results  the  reader  is  invited  to 
consult  [46], [49]. 

Definition  3.2.1.  Let  (X,d)  be  a  metric  space.  A  mapping  \|/  :  X  x  Ax  [0, 1]  — >  X  is  said 
to  be  a  convex  structure  on  X  if 

d(u,\y(x,y,A))  <  Ad{u,x)  +  {\-  A)d(u,y),  dx,y,u  £  X  and  VA  £  [0, 1].  (3.1) 

Definition  3.2.2.  The  metric  space  (X,d)  together  with  the  convex  structure  \|/  is  called  a 
convex  metric  space. 

A  Banach  space  and  each  of  its  subsets  are  convex  metric  spaces.  There  are  ex¬ 
amples  of  convex  metric  spaces  not  embedded  in  any  Banach  space.  The  following  two 
examples  are  taken  from  [49] . 

Example  3.2.1.  Let  1  be  the  unit  interval  [0, 1]  and  X  be  the  family  of  closed  intervals 
[ai,bj]  such  that  0  <  a,  <  ly  <  1.  For  /,  =  [a/,Z?(],  Ij  -  \&j,bj]  and  A  £  I,  we  define  a 
mapping  \|/  by  \\t(Ij,Ij,A )  =  [Ta,  +  (1  -  A)aj,Abj  +  (1  -  A)bf\  and  define  a  metric  d  in  X  by 
the  Hciusdorff  distance,  i.e. 

d(Ij,Ij)  =  max{\ai-aj\,\bi-bj\}. 

Example  3.2.2.  We  consider  a  linear  space  L  which  is  also  a  metric  space  with  the  fol¬ 
lowing  properties: 

(a)  For  x,y  e  L,  d(x,y)  =  d(x - y, 0); 


57 


(b)  For  x,y  e  L  and  A  e  [0, 1], 


d(Ax  +  (1  -  A)y,  0)  <  Ad(x,  0)  +  (1  -  A)d(y,  0). 

Hence  L,  together  with  the  convex  structure  )',/{)  =  Ax  +  ( 1  -  A)y,  is  a  convex  metric 
space. 

Definition  3.2.3.  Let  X  be  a  convex  metric  space.  A  nonempty  subset  K  c  X  is  said  to  be 
convex  if\\t(x,y,A)  e  K,  Vx,y  e  K  and  VA  e  [0, 1]. 

We  define  the  set  valued  mapping  \jjr :  V(X)  -»  P{X)  as 

V(A)  =  {v(x,y,A)  |  Lx,y  e  A,VA  e  [0, 1]},  (3.2) 

where  A  is  an  arbitrary  set  in  X. 

In  [49]  it  is  shown  that,  in  a  convex  metric  space,  an  arbitrary  intersection  of  convex 
sets  is  also  convex  and  therefore  the  next  definition  makes  sense. 

Definition  3.2.4.  The  convex  hull  of  the  set  A  c  X  is  the  intersection  of  all  convex  sets  in 
X  containing  A  and  is  denoted  by  co{A). 

Another  characterization  of  the  convex  hull  of  a  set  in  X  is  given  in  what  follows. 
By  defining  Am  =  \jjr(Am_i)  with  Aq  =  A  for  some  A  c  X,  it  is  discussed  in  [46]  that  the 
set  sequence  [Am}m>o  is  increasing  and  limsupA,,,  exists,  and  limsupA,,,  =  lim  inf  A,„  = 
limAw  =  Utf2=o  Am. 

Proposition  3.2.1  ([46]).  Let  X  be  a  convex  metric  space.  The  convex  hull  of  a  set  Ac  X 
is  given  by 

oo 

co(A )  =  lim  Am  =  Am.  (3.3) 

m= 0 

It  follows  immediately  from  above  that  if  Am+\  =  Am  for  some  m,  then  co(A)  =  Am. 
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3.2.2  On  the  convex  hull  of  a  finite  set 


For  a  positive  integer  n,  let  A  =  {x\,...,xn}  be  a  finite  set  in  X  with  convex  hull 
co(A)  and  let  z  belong  to  co(A ).  By  Proposition  3.2.1  it  follows  that  there  exists  a  positive 
integer  m  such  that  z  e  Am.  But  since  Am  =  \pr(Am_i)  it  follows  that  there  exits  zi,Z2  e  Am-\ 
and  d(i,2)  e  [0, 1]  such  that  z  =  t|/(zi,Z2,^(i,2))-  Similarly,  there  exits  Z3,Z4,Z5,Z6  e  Am_2 
and  d(3i4),d556  e  [0,1]  such  that  zi  =  V(Z3,Z4, 2(3,4))  and  z2  =  V(Z5.Z6.^(5,6))-  By  further 
decomposing  Z3,Z4,zs  and  ze  and  their  followers  until  they  are  expressed  as  functions  of 
elements  of  A  and  using  a  graph  theory  terminology,  we  note  that  z  can  be  viewed  as  the 
root  of  a  weighted  binary  tree  with  leaves  belonging  to  the  set  A.  Each  node  a  (except  the 
leaves)  has  two  children  a\  and  «?,  and  are  related  through  the  operator  \| /  in  the  sense 
a  -  \)/(q'i,Q'2,/1)  for  some  A  e  [0, 1].  The  weights  of  the  edges  connecting  a  with  ai  and 
a 2  are  given  by  A  and  1  -  A  respectively. 

From  the  above  discussion  we  note  that  for  any  point  z  6  co(A)  there  exits  a  non¬ 
negative  integer  m  such  that  z  is  the  root  of  a  binary  tree  of  height  m,  and  has  as  leaves 
elements  of  A.  The  binary  tree  rooted  at  z  may  or  may  not  be  a  perfect  binary  tree,  i.e. 
a  full  binary  tree  in  which  all  leaves  are  at  the  same  depth.  That  is  because  on  some 
branches  of  the  tree  the  points  in  A  are  reached  faster  then  on  others.  Let  /?,  denote  the 
number  of  times  x,  appears  as  a  leaf  node,  with  X"=i  ni  -  2'"  and  let  m;/  be  the  length  of 
the  i‘jl  path  from  the  root  z  to  the  node  x„  for  /  =  1 . .  .n,.  We  formally  describe  the  paths 
from  the  root  z  to  x ;  as  the  set 


Pz,Xj  -  jJo’Wiij) jth)  \  , 


(3.4) 


where  {y^j} is  the  set  of  points  forming  the  i/h  path,  with  y/;.o  =  z  and  =  x-,  and 


’  7=0 
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where  {Xilj}fl  is  the  set  of  weights  corresponding  to  the  edges  along  the  paths,  in  partic¬ 
ular  Xihj  being  the  weight  of  the  edge  (y^j-uy^j).  We  define  the  aggregate  weight  of  the 
paths  from  root  z  to  node  x-t  as 

m  mi i 

'M/<p«)±£n  <3-5) 

/=!  7=1 

It  is  not  difficult  to  note  that  all  the  aggregate  weights  of  the  paths  from  the  root  z 
to  the  leaves  {x\,.. . , xn]  sum  up  to  one,  i.e. 

n 

^(pM)= i. 

i=l 


Figure  3.1:  The  decomposition  of  a  point  z  e  A3  with  A  =  {x\,X2,X3\ 

Example  3.2.3.  Figure  3.1  shows  a  binary  tree  corresponding  to  a  point  z  6  A3,  where 
A  =  {xi,X2,X3}.  For  this  particular  example,  the  paths  from  to  root  z  to  the  leaves  xL  are 
given  by 

PZ,X  1  =  {({z,Zl,Z3,  Vi },  {/l(i;2),/l(3;4),/l(7i8)}) ,  ({z,Zi  ,Z4,Vi},  {/l(i;2),  (1  —  ^1(3,4)),  -A(9,10)}) » 
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({Z,Z2,Z5,X l }, {(1  ~/l(l,2))»'l(5,6)./l(ll>12)}).(fcZ2,Z6.^l}){(l  _^(1,2)).(1  “  '1(5,6)X'1(13,14)})K 


PZ,x 2  =  |({Z,Z1,Z3^2},{/1(1,2),/1(3,4),(1 -/l(7,8))})} 

Pz,x 3  =  {({Z,Z1,Z4,3C3},{/1(1;2),(1  -/l(3,4)),(l  “  ^(9,10))})  >  ({Z>Z2,Z5»  *3}>  {(1  “  ^(1,2)). ^(5,6).  (1  “ '*(11,12))})  > 
({Z,Z2,Z6,X3},{(1  -/t(l,2))>(l  _/l(5,6)X(l  -^(13,14))})} 

and  the  path  weights  are 

W(Pz,Xl)  =  A(i  ,2)^(3,4)^(7,8)  +  /i(l,2)(l  -^(3,4)M(9,10)  +  (1  “  '*(1,2)X'1(5,6)>'*(11,12)! 

AV(. Pz,x2)  =  '*(l,2)'t(3,4)(l  -A(7,8)), 

W(PZ,X3)  =  ^(1,2)(1  -^(3,4))(1  -^(9,10))  + (1  -^(1,2)M(5,6)(1  -AlU2))  +  (l  -/J(l,2))(l  -%,6))(1  -^(13,14))- 

Definition  3.2.5.  Given  a  small  enough  positive  scalar  s  <  1  we  define  the  following  sub¬ 
set  ofco(A )  consisting  of  all  points  in  co{A )  whose  aggregate  weights  are  lower  bounded 
by  s,  i.e. 

coe(A )  =  {z|zg  co(A),AV(Pz,Xj)  >  s,  dxi  e  A}.  (3.6) 

Remark  3.2.1.  By  a  small  enough  value  of  s  we  understand  a  value  such  that  the  in¬ 
equality  AV{ Pz>Xj)  >  e  is  satisfied  for  all  i.  Obviously,  for  n  agents  s  needs  to  satisfy 

1 

n 

but  usually  we  would  want  to  choose  a  value  much  smaller  then  1  /n  since  this  implies  a 
richer  set  coE{A). 

Remark  3.2.2.  We  can  iteratively  generate  points  for  which  we  can  make  sure  that  they 
belong  to  the  interior  of  the  convex  hull  of  a  finite  set  A  =  {xi,...,xn}.  Given  a  set  of 
positive  scalars  {/li , . . . ,  An-  \ }  e  (0, 1),  consider  the  iteration 

yi+ 1  =  y(yi,xi+ i ,  Ad  fori=\...n-\  with  y\  =  x\ .  (3 .7) 
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It  is  not  difficult  to  note  that  yn  is  guaranteed  to  belong  to  the  interior  ofco(A).  In  addition, 
if  we  impose  the  condition 


<  Ai  < 


1  -  (n  -  1)£ 

1  -(/7-2)e’ 


i  =  1 ...  77  -  1 , 


(3.8) 


and  £  respects  the  inequality 


i 

£»-l  < 


1  -  (n  -  l)e 
1  -(/?  -2)e’ 


(3.9) 


then  yn  G  coe(A).  We  should  note  that  for  any  n>2  we  can  find  a  small  enough  value  of  £ 


such  that  inequality  (3.9)  is  satisfied. 


The  next  result  characterizes  the  distance  between  two  points  x,y  G  X  belonging  to 
the  convex  hulls  of  two  (possibly  overlapping)  finite  sets  X  and  Y. 


Proposition  3.2.2.  Let  X  =  {x\,...,xnx}  and  Y  =  ,yny  \  be  two  finite  sets  on  X  and 
let  £  <  1  be  a  positive  scalar  small  enough. 


(a)  If  xe  co(X)  and  y  e  X  then 

nx 

d(x,y)  <^Aid(xi,y),  (3.10) 

i=  1 

for  some  Ai  >  0  with  X"2|  Ai  =  1. 

(b)  Ifx  G  co(X )  and  y  G  co(Y)  then 

nx  ”y 

d(x,y )  <  zz  Aijd(xi,yj),  (3.11) 

i=  1  7=1 

for  some  Atj  >  0  with  Z'Z,  Z"=i  =  '• 

(c)  Ifx  G  coE(X),  y  G  coe(Y),  then 

A,-  >  £  and  Aij  >  £2,  V  i,j,  (3.12) 


where  A,-  and  Aij  where  introduced  in  part  (a)  and  part  (b),  respectively. 
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(d)  If  x  6  cos(X),  y  G  coe(Y )  and  X  n  Y  V  0,  i/ie/7 

nx  ny 

^  ^  1  ~£  >  (3.13) 

<=1  7=1 

where  Afj  were  introduced  in  part  (b). 


Proof,  (a)  Mimicking  the  idea  introduced  at  the  beginning  of  this  section,  since  x  e  co(X) 
it  follows  that  there  exists  a  positive  integer  m  such  that  z  £  Xm,  where  Xm+  \  =  vj/(Xm)  with 
Xo  =  X.  Further,  there  exist  zi,Z2  £ X,n_i  and  Au  £  [0, 1]  such  that  z  =  ^(z\,z.i,A\2).  Using 
the  definition  of  the  convex  structure,  it  follows  that  the  distance  between  z  and  y  can  be 
upper  bounded  by 

d(x,y)  <  A\2d(z\,y)  +  (\-Ai2)d(z2,y). 


Inductively  decomposing  z\,zi  and  their  children ,  it  can  be  easily  argued  that 


nx 

d(x,y )  <  ^  Ajd(xi,y ), 

1=1 

for  some  positive  weights  At  >  0  summing  up  to  one. 

(b)  To  obtain  (3.11)  we  proceed  as  in  part  (a)  and  obtain  upper  bounds  on  d(xt,y). 
More  precisely  we  get  that 


ny 

d(xj,y )  <  ^ pjd(xi,yj ),  Vi, 
7=1 

with  pj  >  0  and  E”=|  Pj  =  1,  and  it  follows  that 


nx  ny 

d(x,y)  <  EE  Aijd(xuyj), 

<=i  7=1 


where  Atj  =  A,pj  >  0  and  E”=j  E"E|  ^ ij  =  1- 


(c)  We  note  that  /I,  =  AV(PXXt)  and  pj  -  TV(7\,.y;),  Vi,  /.  But  since  v  G  coe(X)  and 
y  G  coe(Y)  it  immediately  follows  that  A,-  >  s  and  pj  >  s,  and  therefore  d;/  =  e2. 
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(d)  If  In  Y  =£  0  then  there  exists  at  least  one  pair  (i,j)  such  that  d(xuyj)  =  0.  But 
since  Aij  >  s2  the  inequality  (3.13)  follows.  □ 

3.3  Problem  formulation  and  statement  of  the  main  result 

We  consider  a  convex  metric  space  (X,  d )  and  a  set  of  n  agents  indexed  by  i  which 
take  values  on  X .  Denoting  by  k  the  time  index,  the  agents  exchange  information  based  on 
a  communication  network  modeled  by  a  time  varying  graph  G(k)  =  (V,E(k)),  where  V  is 
the  finite  set  of  vertices  (the  agents)  and  E(k)  is  the  set  of  edges.  An  edge  (communication 
link)  ei  j(k)  e  Eik)  exists  if  node  i  receives  information  from  node  j.  Each  agent  has 
an  initial  value  in  X.  At  each  subsequent  time-slot  is  adjusting  its  value  based  on  the 
observations  about  the  values  of  its  neighbors.  The  goal  of  the  agents  is  to  asymptotically 
agree  on  the  same  value.  In  what  follows  we  denote  by  xfk )  6  X  the  value  or  state  of 
agent  i  at  time  k. 

Definition  3.3.1.  We  say  that  the  agents  asymptotically  reach  consensus  (or  agreement) 

if 

lim  d(xj(k),Xj(k ))  =  0,  V/,y,  i  ^  j.  (3.14) 

£->oo 

Similar  to  the  communication  models  used  in  [52],  [4],  [34],  we  impose  minimal  as¬ 
sumptions  on  the  connectivity  of  the  communication  graph  G(k).  Basically  these  assump¬ 
tion  consists  of  having  the  communication  graph  connected  infinitely  often  and  having 
bounded  intercommunication  interx’al  between  neighboring  nodes. 

Assumption  3.3.1  (Connectivity).  The  graph  (V,  Em)  is  connected,  where  Em  is  the  set  of 
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edges  ( i,j )  representing  agent  pairs  communicating  directly  infinitely  many  times,  i.e., 


Eoo  =  {(i,j)  I  (/,./)  e  E(k)for  infinitely  many  indices  k} 

Assumption  3.3.2  (Bounded  intercommunication  interval).  There  exists  an  integer  B>  1 
such  that  for  every  (. i,j )  6  Em  agent  j  sends  its  information  to  the  neighboring  agent  i  at 
least  once  every  B  consecutive  time  slots,  i.e.  at  time  k  or  at  time  k+  1  or  ...  or  (at  latest) 
at  time  k  +  B—  1  for  any  k  >  0. 

Assumption  3.3.2  is  equivalent  to  the  existence  of  an  integer  B  >  1  such  that 

(i,  j )  €  E(k)  U  Elk  +  1)  U . . .  U  E(k  +  B  -  1),  V(i,  j)  e  and  Mk. 

Let  Nfk)  denote  the  communication  neighborhood  of  agent  i,  which  contains  all 
nodes  sending  information  to  i  at  time  k,  i.e.  Nfk)  =  {/  |  ejj(k)  6  E(k)\  U  {/},  which  by 
convention  contains  the  node  i  itself.  We  denote  by  Afk )  =  \xj(k),ij  g  Nfk)}  the  set  of 
the  states  of  agent  z’s  neighbors  (its  own  included),  and  by  A(k)  =  {xfk),  i  =  1  ...n]  the  set 
of  all  states  of  the  agents. 

The  following  theorem  states  our  main  result  regarding  the  asymptotic  agreement 
problem  on  metric  convex  space. 

Theorem  3.3.1.  Let  Assumptions  3.3.1  and  3.3.2  hold  for  G(k )  and  let  e  <  1  be  a  positive 
scalar  sufficiently  small.  If  agents  update  their  state  according  to  the  scheme 

xfk+l)  G  coE(Afk)),  Vi,  (3.15) 

then  they  asymptotically  reach  consensus,  i.e. 

lim  d(xj(k),xfk ))  =  0,  Vi,  7,  i  ±  j.  (3.16) 

k->  00 
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Remark  3.3.1.  We  would  like  to  point  out  that  the  result  refers  strictly  to  the  convergence 


of  the  distances  between  states  and  not  to  the  convergence  of  the  states  themselves.  It 
may  be  the  case  that  the  sequences  {xfk)}k> o  i  =  \  ...n  do  not  have  a  limit  and  still  the 
distances  d{Xi(k),Xj{k))  decrease  to  zero  as  k  goes  to  infinity.  In  other  words  the  agents 
asymptotically  agree  on  the  same  value  which  may  be  very  well  variable.  However,  as 
stated  in  the  next  corollary  this  is  not  the  case  and  in  fact  the  states  of  the  agents  do 
converge  to  the  same  value. 

Corollary  3.3.1.  Let  Assumptions  3.3.1  and  3.3.2  hold  for  G(k )  and  let  e  <  1  be  a  positive 
scalar  sufficiently  small.  If  agents  update  their  state  according  to  the  scheme 

Xj(k+  1)  e  cog(Ai(k)),  Vi,  (3.17) 

then  there  exists  x*  e  X  such  that 

lim  d{xfk),x*)  =  0,  V/.  (3.18) 

»oo 

We  will  give  the  proofs  for  both  Theorem  3.3.1  and  Corollary  3.3.1  in  the  subse¬ 
quent  section. 

Remark  3.3.2.  A  procedure  for  generating  points  that  are  guaranteed  to  belong  to  coJAfk)) 
is  described  in  Remark  3.2.2.  The  idea  of  picking  xfk  +  l)  from  coE{Afk))  rather  than 
co(Aj(k))  is  in  the  same  spirit  of  the  assumption  imposed  on  the  non-zero  consensus 
weights  in  [51],  [34],  [4],  i.e.  they  are  assumed  lower  bounded  by  a  positive,  sub¬ 
unitary  scalar.  Setting  xfk  +  1)  6  co(Afk ))  may  not  necessarily  guarantee  asymptotic 
convergence  to  consensus.  Indeed,  consider  the  case  where  X  =  R  with  the  standard 
Euclidean  distance.  A  convex  structure  on  R  is  given  by  \|/(x,y,/l)  =  Ax  +  ( 1  -  A)y,  for 
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any  x.y  e  R  and  A  e  [0, 1].  Assume  that  we  have  two  agents  which  exchange  informa¬ 
tion  at  all  time  slots  and  therefore  A\(k)  =  {x\(k),X2(k)},  A2(k)  =  {x\(k),X2(k)},  Ik  >  0. 
Let  x\ (k  +  1)  =  A(k)x\(k)  +  (1  -  A(k))x2(k),  where  A(k)  =  1  -  0.1e_A  and  let  X2(k  +  1)  = 
p(k)x\(k)  +  (1  - /i{k))x2(k),  where  p(k)  =  0.1  e~k.  Obviously,  xfk+  1)  6  co(Afk)),  i  =  1,2 
for  all  k>0.  It  can  be  easily  argued  that 

d{x\(k+  \),X2 (k+  1))  <  (/l(A:)(l  - /u(k))  + /u(k)(  1  - A{k))) d{x\(k,X2{k))).  (3.19) 

We  note  that  lim^oo  Ilf^o  U(^)(l  ~ l*(k))  +  (1  -  A(k))/u(k))  =  lim^_,oo  Ilf=o(l  _  0.2e~k  + 
0.02e~2k )  =  0.73  and  therefore  under  inequality  (3.19)  asymptotic  convergence  to  con¬ 
sensus  is  not  guaranteed.  In  fact  it  can  be  explicitly  shown  that  the  agents  do  not  reach 
consensus.  From  the  dynamic  equation  governing  the  evolution  of  xfk),  i  =  1,2,  we  can 
write 

/  \ 

A(k)  l-A(k) 

x(k  +  1)  =  x(k),  x(0)  =  xo, 

k  f*(k)  1 

where  x(k)r  =  \x\(k),X2(k)],  and  we  obtain  that 

/  \ 

0.8540  0.1451 

lim  x(k)  =  jco 

k—too 

0.1451  0.8540 
v 

and  therefore  it  can  be  easily  seen  that  consensus  is  not  reached  from  any  initial  states. 

3.4  Proof  of  the  main  result 

This  section  is  divided  in  three  parts.  In  the  first  part  we  use  the  results  of  Section 
3.2.2  regarding  the  convex  hull  of  a  finite  set  and  show  that  the  entries  of  the  vector 
of  distances  between  the  states  of  the  agents  at  time  k  +  1  are  upper  bounded  by  linear 
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combinations  of  the  entries  of  the  same  vector  but  at  time  k.  The  coefficients  of  the  linear 
combinations  are  the  entries  of  a  time  varying  matrix  for  which  we  prove  a  number  of 
properties  (Lemma  3.4.1).  In  the  second  part  we  analyze  the  properties  of  the  transition 
matrix  of  the  aforementioned  time  varying  matrix  (Lemma  3.4.2).  The  last  part  is  reserved 
to  the  proof  of  Theorem  3.3.1. 

Lemma  3.4.1.  Given  a  small  enough  positive  scalar  s  <  1,  assume  that  agents  update 
their  states  according  to  the  scheme  xfk  + 1)6  coE{Afk)),  for  edl  i.  Letd(k)  =  ( d(xfk),Xj(k ))) 
for  i  ±  j  be  the  N  dimensioned  vector  of  all  distances  between  the  states  of  the  agents, 
where  N  =  Then  we  obtedn  that 

d(k  +  1)  <  W(k)d(k),  d{ 0)  =  d0,  (3.20) 

where  the  NxN  dimensioned  matrix  W(k )  has  the  following  properties: 

(a)  W(k )  is  non-negative  and  there  exits  a  positive  scalar  t]  6  (0, 1)  such  that 


[W(k)]jj  >  77,  V  i,k  (3.21) 

[Wmrj  >  77,  V  [W(k)]Tj  ±  0, 1  *  ],  V  k.  (3.22) 

(b)  If  Nfk)  n  Nj(k)  ±  0,  then  the  row  i  of  matrix  W(k),  corresponding  to  the  pair  of  agents 
(/',  /),  has  the  property 

N 

(3.23) 

j=i 

where  rj  is  the  same  as  in  part  (a). 

(c)  If  Nfk)  n  Nj(k)  =  0  then  the  row  i  corresponding  to  the  pear  of  agents  ( i,j )  sums  up 
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to  one,  i.e. 


N 

J][WmTj=l.  (3.24) 

7=i 

In  particular  ifG(k )  is  completely  disconnected  (i.e.  agents  do  not  send  any  informa¬ 
tion),  then  W(k )  =  I. 

(d)  the  rows  ofW(k )  sum  up  to  a  value  smaller  or  equal  then  one,  i.e. 

N 

J][W(k)]Tj<\,  vu.  (3.25) 

7=i 

Proof.  Given  two  agents  i  and  j,  by  part  (b)  of  Proposition  3.2.2  the  distance  between 
their  states  can  be  upper  bounded  by 

d(xi(k  +  1), Xj(k  +  1))  <  ^  wl/)q(k)d(xp(k), xq(k)),  i  ■+■  j ,  (3.26) 

peNi(k),qeNj{k) 

where  wl/>q(k)  >  0  and  YJpeNl(k).qeNJ(k)  wj,q(k)  =  1 .  By  defining  W (k)  =  (Wpq(k))  for  i  4  j  and 
p±q  (where  the  pairs  (i,  j)  and  (p,  q)  refer  to  the  rows  and  columns  of  W(7c),  respectively), 
inequality  (3.20)  follows.  We  continue  with  proving  the  properties  of  matrix  W (k). 

(a)  Since  all  w'pq(k)  >  0  for  all  i  4  /,  p  e  N,(k)  and  q  e  Nj(k)  we  obtain  that  W(k)  is 
non-negative.  By  part  (c)  of  Proposition  3.2.2,  there  exists  q  =  s2  such  that  w'pq(k)  >  q  for 
all  non-zero  entries  of  W (k).  Also,  since  i  e  N,(k)  and  j  6  N fk)  for  all  k  >  0  it  follows 
that  the  term  w'.  ikyKxfk),  Xj(k)),  with  w/ik)  >  q  will  always  be  present  in  the  right-hand 
side  of  the  inequality  (3.26),  and  therefore  W (k)  has  positive  diagonal  entries. 

(b)  Follows  from  part  (d)  of  Proposition  3.2.2,  with  q  =  e2. 

(c)  If  N,(k)  n  Nj(k)  =  0  then  no  terms  of  the  form  Wpp (k)d(Xp (k ) ,  xp (k))  will  appear 
in  the  sum  of  the  right  hand  side  of  inequality  (3.26).  Hence  YjpeNj(k),qeNj(k)wpq(k)  =  1 
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and  therefore 


N 

2[W(*)]7J=1- 

J=i 

If  G(k)  is  completely  disconnected,  then  the  sum  of  the  right  hand  side  of  inequality  (3.26) 
will  have  only  the  term  wl/.(k)d(xi(k),x :(k))  with  w\J.(k)  =  1,  for  all  i,j  =  1  ...n.  Therefore 

lJ  J  lJ 

W (k)  is  the  identity  matrix. 

(d)  The  result  follows  from  parts  (b)  and  (c).  □ 

Let  G(k )  =  ( V,E(k ))  be  the  underlying  graph  of  W(/c)  and  let  i  and  j  refer  to  the  rows 
and  columns  of  W (k),  respectively.  Note  the  under  this  notation,  index  i  corresponds  to  a 
pair  (i,j)  of  distinct  agents.  It  is  not  difficult  to  see  that  the  set  of  edges  of  G(k )  is  given 
by 

E(k)  =  {((i,j),(p,q))  I  (Up)  6  E(k),(j,q )  6  E(k),i  ±j,p±q}.  (3.27) 

Proposition  3.4.1.  Let  Assumptions  3.3.1  and  3.3.2  hold  for  G(k).  Then,  similar  proper¬ 
ties  hold  for  G(k )  as  well,  i.e. 

(a)  the  graph  ( V,  Em )  is  connected,  where 

E0 o  =  !(/,/)  |  (i,  j)  6  E(k)  infinetly  many  indices  k }; 

(h)  there  exists  an  integer  B  >  1  such  that  every  (/,  /)  e  Em  appears  at  least  once  every 
B  consecutive  time  slots,  i.e.  at  time  k  or  at  time  k+  1  or  ...  or  (at  latest)  at  time 
k  +  B  -  1  for  any  k>  0. 

Proof.  It  is  not  difficult  to  observe  that  similar  to  (3.27),  E0 3  is  given  by 

Eoc  =  {((i,j),(p,q))  I  (i,p)  e  £oo,0',p)  6  Eoo,p  ±qf±  j}.  (3.28) 
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(a)  Showing  that  (V,E oo)  is  connected  is  equivalently  to  showing  that  for  any  two 

pairs  (i,j)  and  (p,q)  there  exits  a  path  connecting  them.  Since  (V,  Em)  is  assumed  con¬ 
nected,  there  exits  a  path  «o  — »  h  —*  ■  ■  ■ ,  — »  h-\  —■ *  U ,  for  some  l  <  n,  such  that  z'o  =  P  and 

k  =  i-  From  (3.28),  it  is  easily  argued  that  ( i0,j )  -» (z'i ,j)  (//-W)  -»  070)  rep¬ 

resents  a  path  connecting  (z,y)  with  (p.j).  Similarly,  there  exits  a  path  jo —*  j\  — >  . . .  — » 
jm-i  -»  jm  for  some  m  <  n,  such  that  j0  =  q  and  jm  =  j.  Therefore,  (pjo)  -» (/x./i)  -» 
. . .  — » (p,jm-\)  — >  (p,jm)  is  a  path  connecting  (p,/)  with  (p,<r/)  and  it  follows  that  (z,  j)  and 
(p,^)  are  connected. 

(b)  Let  (( i,j),(p,q ))  be  an  edge  in  Loo  or  equivalently  (z',p)  6  and  (y,^)  G  ISoo.  By 
Assumption  3.3.2,  we  have  that  for  any  k  >  0 

(i,  p)  g  £(E)  U  E(k  +  1) . . .  U  E(k  +  B  -  1), 

(y,  g  £(E)  U  £(ik  +  1) . . .  U  E(k  +  B  -  1), 

where  the  scalar  B  was  introduced  in  Assumption  3.3.2.  But  this  also  implies  that 

(7J)  g  E(k) UE(k+l)U...UE(k  +  B-l),  V(7 J)  g 

Choosing  B  =  B,  the  result  follows.  □ 

Let  0(E,5)  -  W (k-  1)W(C-2)---W(5),  with  O (k,k)  =  W (k)  denote  the  transition 
matrix  of  W(E)  for  any  k  >  ,v.  It  should  be  obvious  from  the  properties  of  W(7c)  that  O (k,  s ) 
is  a  non-negative  matrix  with  positive  diagonal  entries  and  ||®(fc,  v)||TO  <  1  for  any  k  >  s. 

Lemma  3.4.2.  Let  W(k)  be  the  matrix  introduced  in  Lemma  3.4. 1.  Let  Assumptions  3.3.1 

and  3.3.2  hold  for  G(k).  Then  there  exits  a  row  index  i*  such  that 

N 

^[OQ  +  m, <  1  -qm  V  s,m  >  B—  1,  (3.29) 

J=i 
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where  rj  is  the  lower  bound  on  the  non-zero  entries  ofW(k )  and  B  is  the  positive  integer 
from  the  part  (b)  of  the  Proposition  3.4.1. 

Proof.  Let  (i*,j*)  e  E ^  be  a  pair  of  agents.  By  Assumptions  3.3.1  and  3.3.2,  there  exits  a 
positive  integer  s'  e  {s,s +  1,. . .,s  +  B- 1}  such  that  agent  j*  sends  information  to  agent  i* 
at  time  s' .  This  implies  that  Np  (k)  n  Np (k)  4  0  and  by  part  (b)  of  Lemma  3 .4. 1 ,  we  have 
that 

N 

2[W(V)]-«J<  1—77, 

7=1 

where  i*  is  the  index  corresponding  to  the  pair  (/*,  /')•  The  sum  of  the  i*  row  of  transition 

matrix  0(5'  +  1,5)  can  be  expressed  as 

N  N  N 

[$(/  +  1  ,s)]Kj  = 

;=1  7=1  h= 1 

But  since  ||0(fc,5)||oo  <  1  for  any  k>  s,  we  have  that  Z^_1[<X>(s/,.s)]jfc  <  1  for  any  j,  and 
therefore 

N 

Yarns' +  l,s)yp-j<l-T].  (3.30) 

7=1 

We  can  write  0(5'  +  2,5)  =  W(s'  +  l)©(5r  +  1,5)  and  it  follows  that  the  i*  row  sum  of 
0(5'  +  2, 5)  can  be  expressed  as 

N  N  N 

^[0(5' +2,5)]7,j = ^[W(5' + + 1  ,s)]Th. 

7=1  7=1  h=  1 

Since  Yjj  \  [®(^r  +  1 ,5)]  <  1  for  any  j  it  follows  that 

N  N 

2[®(5'+2,5)]7*J<[W(5,  +  l)]N2[(E>(5,  +  1^)]^+  Yj  [W(5'  +  1)]7,J< 

7=i  h=  1  7=1 

N 

<[W(5'  +  1)]7,7*(1-T7)+  Y  [W(5'  +  l)]7,7<^[W(5,  +  1)]7,j-7/[W(5'  +  l)]7»7*  <  l-?72, 

7= h]±i*  7=1 
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since  [W(Y  +  1  )]pp  >  77.  By  induction  it  can  be  easily  argued  that 

N 

£[©(✓  +  m,  s)]j*j  <  1  -  r]m,  Mm  >0.  (3.3 1) 

7=i 

Note  that  by  Assumption  3.3.2,  a  pair  ( i,j )  can  exchange  information  at  s'  =  s  the  earliest 
or  at  s'  =  .s  +  5  -  1  the  latest.  From  (3.31)  we  obtain  that  for  s'  =  .s  +  5  -  1 

N 

J][®(s  +  B-1+  m,  s)]j.j  <  1  - r]m.  Mm  >  0,  (3.32) 

J=i 

and  for  s'  =  s 

N 

^[®(5  +  m,  s)]-j*j  <  1  -  r]m.  Mm  >  0, 

7=1 

or 

N 

J][®(5  +  B-  1  +m,s)]1,-j  <  1  Mm  >  0,  (3.33) 

7=i 

From  (3.32)  and  (3.33)  we  get 
N 

J][©(5  +  B-  1  +m,s)]1*j  <  1-7 f+B~\  Ms,m  >  0, 

7=i 

or  equivalently 

N 

2 [©(.S'  +  m, s )]7, J  <  1  - T]m ,  Mm  >5-1.  (3.34) 

7=i 

□ 

Corollary  3.4.1.  Let  W(k )  be  the  matrix  introduced  in  Lemma  3.4.1  and  let  Assumptions 
3.3.1  and  3.3.2  hold  for  G(k).  We  then  have 

[®(5  +  (7V-  1)5-  l,s)\ij  >  r,{N~lW  Ms,i,j,  (3.35) 

where  rj  is  the  lower  bound  on  the  non-zero  entries  of  W(k)  and  B  is  the  positive  integer 
from  the  part  (b)  of  the  Proposition  3.4.1. 
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Proof.  By  Proposition  3.4.1  and  Lemma  3.4.1  all  the  assumptions  of  Lemma  2,  [34]  are 
satisfied,  from  which  the  result  follows.  □ 

We  are  now  ready  to  prove  Theorem  3.3.1  and  Corollary  3.3.1. 

3.4.1  Proof  of  Theorem  3.3.1 

We  have  that  the  vector  of  distances  between  the  states  of  the  agents  respects  the 
inequality 

d(k+\)  <  W(L)d(C), 

where  the  properties  of  W(fc)  are  described  by  Lemma  3.4.1. 

It  immediately  follows  that 

||d(L+  l)||oo  <  l|d(fc)||oo,  for  k>0.  (3.36) 

Let  Bq  =  (TV-  Y)B  —  1,  where  B  is  the  positive  integer  from  the  part  (b)  of  the 
Proposition  3.4.1.  In  the  following  we  show  that  all  row  sums  of  O ( s  +  2Bq, s)  are 
upper-bounded  by  a  positive  scalar  strictly  less  than  one.  Indeed  since  0(5  +  2Bq,  s )  = 
0(5  +  2Bq,  5  +  B0)O(5  +  Bq,  5 )  we  obtain  that 

N  N  N 

2 ms  +  2Bo,  5)]7j  =  ^[0(5  +  2Bo,  5  +  Mi]  J]ms  +  Bo,  s)]-fh,  VI. 

7=i  7=i  h= i 

By  Lemma  3.4.2  we  have  that  there  exists  a  row  j*  such  that 

N 

^[0(5  +  Bo,5)]?/-<1-775°,V5, 
h=  1 


74 


and  since  [0(5  +  Bq,  5)]^  <  1  for  any  j,  we  get 

TV  TV 

]T  [0(5  +  2£0,  s)]fj  <  2  [°(5  +  2^0’ 5  +  Bo)]ij  +  [°(5  +  2^0’ 5  +  %(!  -  A  = 

7=i  7=i.7*7* 

TV 

=  2  [0(5  +  2Bq,  5  +  5o)]/J  -  ms  +  2fi0, 5 
7=i 

By  Corollary  3.4.1  it  follows  that 

[0(5  +  2B0, 5  +  B0)]7j  >  ^0+1,  V7,  ],  5, 
and  since  X'-l,  [0(5  +  2Bo,Bo)]7;  <  1  we  get  that 

J—[  J 

N 

^[O(5  +  2B0,5)]7j  <  1  -^0+1  V7,5. 

7=1 

Therefore 

||0(5  +  2B0,  s)||oo  <  1  - V2Bo+l  V 5.  (3.37) 

It  follows  that 

||d(^)||oo<(l-772S°+1flld(0)||oo,  Vfc  >  0,  (3.38) 

where  4  =  2IcBq  which  shows  that  the  subsequence  {Hdf^Ollool^o  asymptotically  con¬ 
verges  to  zero.  Combined  with  inequality  (3.36)  we  farther  obtain  that  the  sequence 
{||d(/c)||ooh>o  asymptotically  converges  to  zero.  Therefore  the  agents  asymptotically  reach 
consensus. 

3.4.2  Proof  of  Corollary  3.3.1 

The  main  idea  of  the  proof  consist  of  showing  that  the  set  co(A(k)),  where  A(k)  - 
{xi(k),  i=l...n),  converges  to  a  set  containing  one  point. 
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We  first  note  that  since  Aj(k)  c  Aik)  it  can  be  easily  argued  that  co(Aj(k ))  c  co(A(k)), 
for  all  i  and  k.  Also,  since  coJAiik))  c  cfHA^k))  it  follows  that  coJA^k))  c  co(A(k))  and 
consequently  xi(k  +  1)  6  co(A(k)).  Therefore,  we  have  that  co(A(k  +  1))  c  co(A(k))  for  all 
k  and  from  the  theory  of  limit  of  sequence  of  sets,  it  follows  that 

lim inf  co(A(k))  =  lim sup co(A(k))  =  lim co(A(k))  -  Ax,, 
where  Aoo  =  f]/t>o  co{A{k)).  We  denote  the  diameter  of  the  set  A(k)  by 

8{A(k ))  =  sup{J(.r,y)  |  x,y  e  A{k)}, 
and  by  Proposition  2  of  [46]  we  have  that 

8{co{A(k)))  =  5(A(k)). 


From  Theorem  3.3.1  we  have  that 


lim  d(xi(k),Xj(k ))  =  0,  Vi  V  j, 


k—>oo 


and  consequently 


lim  6(A(k))  =  lim  S(co(A(k)))  =  0, 

k — >oo  /: — >oo 


which  also  means  that 


d(Aoo)  =  0, 


i.e.  the  set  A^  contains  only  one  point,  say  .**  e  X ,  or  A^  =  co(.v*),  or 

lim  co(A(k ))  =  co(v*). 

k-^oo 

But  since  jc,(^+  1)  6  cos(Aj(k ))  c  co(A(k))  for  all  i,/c  it  follows  that 

lim  d{xi{k),x*)  =  0,  V  i, 

£— >  <x> 

i.e.  the  states  of  the  agents  converge  to  the  same  point  x*  6  X. 
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3.5  Distance  between  the  consensus  points  and  the  initial  points 

In  this  section  we  analyze  the  evolution  of  the  distance  between  the  states  of  the 
agents  and  their  initial  values  under  the  scheme  described  by  Theorem  3.3.1.  This  analysis 
will  give  us  upper  bounds  on  the  distance  between  the  consensus  point(s)  and  the  initial 
values  of  the  agents. 

Consider  distance  d{xfk),xi{ 0))  for  some  i,l  and  let  us  assume  that  xfk  +  1)  is  cho¬ 
sen  according  to  the  scheme  described  by  Theorem  3.3.1,  i.e.  xfk  +  1)  6  coE(Aj(k)).  By 
part  (a)  of  Proposition  3.2.2  we  can  express  this  distance  as 

d(Xi(k+  l),*/(0))<  Aij(k)d(xj(k),xim ,  (3.39) 

jeNi{k) 

where  Ajfk)  >  s  and  YJjeN,(k)  fi/T)  =  1.  By  defining  the  n  dimensional  vector  p(k)  = 
(d(xj(k),xi( 0)))  (where  i  varies)  and  the  n  x«  dimensional  matrix  A (k)  =  (Ajj(k)),  inequal¬ 
ity  (3.39)  can  be  compactly  written  as 

n\k  +  1)  <  A (k)n\k),  p\0)  =  nl0.  (3.40) 

where  A (k)  is  a  row  stochastic  matrix.  It  is  not  difficult  to  note  that  the  underlying  graph 
of  A (k)  is  G(k)  and  that  in  fact  inequality  (3.40)  is  valid  for  any  /.  In  the  following 
proposition  we  give  upper  bounds  on  the  distance  between  the  consensus  states  and  the 
initial  values  of  the  states. 

Proposition  3.5.1.  Let  Assumptions  3.3.1  and  3.3.2  hold  for  G{k)  and  let  the  states  of  the 
agents  be  updates  according  to  the  scheme  given  by  Theorem  3.3.1.  We  then  have  that 

n 

lim  d(xj(k),xi( 0))  <  V \vjd(xj(0), jc/(0)),  V  i,l,  (3.41) 

fc— »oo  *  * 

7=1 
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where  v  =  (vj)  is  a  vector  with  positive  entries  summing  up  to  one  satisfying 

lim  A(k)A(k -  1) •  •  •  A(0)  =  lvT ,  (3.42) 

k— >oo 

and  where  1  is  the  n  dimensional  vector  of  all  ones  and  A (k)  is  the  matrix  defined  in 
inequality  (3.40). 

Proof.  Our  assumptions  fit  the  assumptions  of  Lemmas  3  and  4  of  [34],  from  where  (3.42) 
follows.  Therefore  by  inequality  (3.40)  the  result  follows.  □ 

Remark  3.5.1.  If  in  addition  to  the  assumptions  of  Proposition  3.5.1  we  also  assume  that 
A (k)  is  doubly  stochastic,  then  by  Proposition  1  of  [34]  we  get  that 

1  T 

lim  A(k)A(k -!)■■■  A(0)  =  -11T . 

k—>  oo  n 

Therefore,  inequality  (3.41)  gets  simplified  to 

1  n 

lim  d(xj(k),xi( 0))  <  -  V '  d(xj(0),  x/(0)),  Vi. 

k—>oo  n  ^ 

7=1 

The  assumptions  in  this  remark  correspond  to  the  assumptions  for  the  average  consensus 
problem  in  Euclidean  spaces.  For  the  aforementioned  case,  the  consensus  point  is  given 
by  the  average  of  the  initial  points,  i.e.  xav  =  “  X"=  i  -V/(0).  It  can  be  easily  check  that 
indeed  xav  satisfies 

1  n 

lk?v-*/(0)||  <  -  Vllx7(0)-x/(0)||, 

77  ^ 

7=1 

where  ||  •  ||  represents  the  euclidean  norm. 

3.6  Application  -  Asymptotic  consensus  of  opinion 

Social  networks  play  a  central  role  in  the  sharing  of  information  and  formation  of 
opinions.  This  is  true  in  the  context  of  advising  friends  on  which  movies  to  see,  relaying 
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information  about  the  abilities  and  fit  of  a  potential  new  employee  in  a  firm,  debating  the 
merits  of  politicians.  In  the  following  we  consider  a  scenario  in  which  a  group  of  agents 
try  to  agree  on  a  common  opinion.  Assume  for  example  that  a  group  of  friends  would 
like  to  go  to  see  a  movie.  Different  members  of  the  group  may  suggest  different  movies. 
A  member  of  the  group  discusses  with  all  or  just  some  of  his/her  friends  to  find  out  about 
their  opinions.  This  member  gives  some  weight  (importance)  to  the  opinion  of  his  friends 
based  on  the  trust  in  their  expertise.  For  instance  some  members  of  the  group  are  more 
informed  about  the  quality  of  the  proposed  movies,  and  therefore  there  opinions  may  have 
a  heavier  influence  on  the  final  decision.  By  repeatedly  discussing  among  themselves,  the 
group  of  friends  have  to  choose  one  of  the  movies. 

In  the  following  we  mathematically  formalize  the  scenario  described  above  and 
show  that  we  can  use  the  framework  introduced  in  the  previous  sections  to  give  an  al¬ 
gorithm  which  ensures  asymptotic  consensus  on  opinions.  We  model  the  opinion  of  a 
member  of  the  group  (agent)  as  a  discrete  random  variable.  Under  an  appropriate  metric 
and  by  providing  a  convex  structure  we  show  that  the  metric  space  of  discrete  random 
variable  is  convex  .  In  addition,  we  analyze  in  more  detail  the  convex  hull  of  a  finite  set; 
this  analysis  is  possible  since  the  convex  structure  is  given  explicitly.  We  give  an  itera¬ 
tive  algorithm  that  ensures  agreement  of  opinion,  which  is  based  on  Theorem  3.3.1  and 
provide  some  numerical  simulations. 
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3.6.1  Geometric  framework 


Let  5  be  a  positive  integer,  let  5  =  {1,2, ...,5}  be  a  finite  set  and  let  (£l,T ,P)  be  a 
probability  space.  We  denote  by  X  the  space  of  discrete  measurable  functions  (random 
variable)  on  P)  with  values  in  S . 

We  introduce  the  operator  d  :  X  x  X  — >  R,  defined  as 

d(X,Y)  =  E[p(X,Y)\, 


where p  :  RxR  — »  {0, 1}  is  the  discrete  metric,  i.e. 

1 

1  x±y 

0  x  =  y 


P(x,y)  =  i 


It  is  not  difficult  to  note  that  the  operator  d  can  also  be  written  as  d(X,  Y )  =  E[t{x±Y}\  - 
Pr(X  ±  Y ),  where  t{x*Y\  is  the  indicator  function  of  the  event  {X  ^  Y\. 

We  note  that  the  operator  d  satisfies  the  following  properties 


1.  For  any  X,  Y  6  X,  d(X,  Y)  =  0  if  and  only  if  X  =  Y  with  probability  one. 


2.  For  any  X,Y,Z  6  X,  d(X,Z)  +  d(Y,Z)  >  d(X,  Y)  with  probability  one, 

and  therefore  is  a  metric  on  X.  The  set  X  together  with  the  operator  d  define  the 
metric  space  (X,d). 

Let  6  e  {1,2}  be  an  independent  random  variable,  with  probability  mass  function 
Pr(6  =  1)  =  A  and  Pr{6  =  2)  =  1  -  A,  where  A  e  [0, 1].  We  define  the  mapping  \|/ :  X  xX  x 
[0, 1]  —>  X  given  by 

y(X1,X2,A)  =  l[e=i}Xl  +  t{e=2]X2,  ^XuX2eX,Ae[0M  (3.43) 
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Proposition  3.6.1.  The  mapping  \|/  is  a  convex  structure  on  X. 


Proof.  For  any  U,X\,X2  e  X  and  A  e  [0, 1]  we  have 


d(UMXuX2,A))  =  E[p(UMXuX2,A))]=E[E[p(UMXuX2,A))\U,XuX2]]  = 


=  E[E[p(U,t{e=1]Xl+tl0=2]X2)]\U,Xl,X2]=E[Ap(U,Xl)  +  (l-A)p(U,X2)]  = 


=  Ad(U,Xi)  +  (l-A)d(U,X2). 


□ 


From  the  above  proposition  it  follows  that  (X,  d,yr)  is  a  convex  metric  space. 
The  next  theorem  characterizes  the  convex  hull  of  a  finite  set  in  X . 


Theorem  3.6.1.  Let  n  be  a  positive  integer  and  let  A  =  {Xi, . . .  ,Xn)  be  a  set  of  points  in 
X.  Consider  the  independent  random  variable  6  taking  values  in  the  finite  set  { 1 , . . . 
with  probability  measure  given  by  Prito  :  6(co)  =  i)  =  wt,  for  some  non-negative  scalars 
Wi,  with  Yj'i=\wi  =  1-  Then 


co{A)  =  \zeX\Z  =  2  he=i}Xi,  Vw;-  >  0,  ^  wt  =  1  i . 


(3.44) 


1=1 


1=1 


Proof.  We  recall  from  Proposition  3.2. 1  that  the  convex  hull  of  A  is  given  by 


co(A)  =  lim  Am  =  [J  Am, 

m=  1 

where  Am  =  \fr(Am_i),  with  A  i  =  vp(A ).  Also,  since  Am  is  an  increasing  sequence,  clearly 
A  c  Am  for  all  m  >  1 .  We  define  the  set 


1 f(A)  = 


X\Z  =  YJl[e=i]Xi,  Vw,-> 

i=  1 
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The  proof  is  structured  in  two  parts.  In  the  first  part  we  show  that  any  point  in  TCfA) 
belongs  to  the  convex  hull  of  A,  while  in  the  second  part  we  show  that  any  point  in  co(A) 
belongs  to  TCf A)  as  well. 


Let  Z  €  '7f(A)  i.e.  Z  =  £"=1 1  {0=i]Xi  where  Pr(8  =  z)  =  w,-,  for  some  w,-  >  0,  2”=i  wi  - 
1.  The  random  variable  6  is  defined  such  that  9(a>i)  =  z  and  Pr(tOj )  =  w,-.  Let  fi,  = 
z  =  1 ...  n  -  1  be  a  set  of  independent  sample  spaces  (i.e.  the  elementary  events  oj1.  and  (Jp 
are  independent  for  any  /  ±  z  and  for  any  j).  We  define  the  probability  measure  for  each 
of  the  events  in  Q,  as 


Pr(co\) 


w  1  +  ...+W/-1 
W  |  +  . . .  +  VV’/ 


Pr(col2) 


Wi 

W\  +  ...  +  VV/’ 


for  z  =  1 ...  77  -  1.  We  consider  the  following  succession  of  events  from  Q, 


5 1  =  jmjtUj . .  .co'l  1 

5 2  —  ico^coj . .  ,cu”_1 


(3.45) 


Sj  =  U1  ;  _i  ItU1  ...CU'.  ...tu'i'  M,7  =  3  .  .  .77  -  1, 

'  w7l--7;-2=l  l  7l  Ji-2  2  11]’  ’ 


.H— 1 


Sn  =  U^  ,•  Im1 

"  w7i-7h-2=1  l  7l  7«-2  2  ] 

For  example,  for  n  =  4  (3.45)  becomes 


51  =  {mjm2cu2}, 

52  = 

53  =  {m}m2u72}  U 

54  =  {mjcu2^}  U  U  {tuJcu2^2}  U  {^m2^2}. 

Using  the  independence  assumption  on  the  events  from  £2,  is  not  difficult  no  see 


that 


Pr(Si )  =  Wi,  7  =  I...77. 
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Assume  that  each  event  a>i  that  we  observe  can  be  decomposed  in  a  succession  of  inde¬ 
pendent  events  from  Qj,  which  are  invisible  to  the  observer.  In  particular  let 


COj  =  Si,i  =  1  ...77. 


The  particular  decomposition  of  event  in  a  set  of  intermediate,  independent  events 
given  by  5,  makes  sense  since  both  oj,  and  S ,  have  the  same  probability  measure.  It 
immediately  follows  that 

1  {(o:0{(o)=i}  ~  l{w,}  =  (3.46) 

Let  us  now  define  the  random  variables  6j :  f 2,  — >  {i,i  +1},  where 


Oi(oj\ )  =  i,  Oi(co\)  =  7  +  1, 


for  i  =  1 ...  n  -  1 .  Obviously 


Pr{6,  =  i )  = 


Wi  +  ...  +  W/-1 

l+l  +.  ..  +  +’,• 


Pr(0j  =  7  +  1)  = 


Wj 

W\  + . . .  +  wf 


and  9j  are  independent  random  variables. 

From  (3.45)  and  (3.46)  together  with  the  independence  of  the  random  variables  6j 
the  following  equalities  in  terms  of  the  indicator  function  are  satisfied 


1{6>=1}  =  n"=Ji{6,y=y} 

1{0=7)  =  tlei_l=i]IFjzjt{ej=j},  i  =  2...n-\  (3-47) 

%=«}  =  1{On-l=n}- 

From  (3.47)  it  follows  that  Z  is  the  result  of  the  nth  step  of  the  iteration 


^7+1  =  lj(9,=7)k7  +  l{6//=7+ 1)30+1, 
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for  i  =  1 . .  .77,  with  Y\  =  X\,  i.e.  Z  =  Yn.  It  can  be  easily  argued  that  Yj  e  A,-_ i,  i  =  2. .  .n 
and  therefore  Z  6  An-\  or  Z  €  co(A)  which  implies  that  r/C (A )  c  co(A). 

We  now  begin  the  second  part  of  the  proof  and  show  that  any  point  in  co(A )  belongs 
to  <7('(A)  as  well.  If  Z  6  co(A),  from  Section  3.2.2  we  have  that  there  exits  a  positive  integer 
m  such  that  Z  e  Am  and  therefore  Z  is  the  root  of  a  binary  tree  of  height  m  with  leaves 
from  the  set  A.  Using  the  same  notations  as  in  Section  3.2.2  for  each  of  the  leaf  nodes  Xj, 
there  exists  n\  >  1  paths  from  Z  to  Xj,  of  lengths  m;z,  /=  1 ...  77,  which  are  denoted  by 


PZ,X;  =  j(H’w)"i,{iW)"'1)  I  /=  I- 

where  F;/j_i  =  \|/(f(/j,*,/1,;jJ  for  j  =  1 . .  .mtj,  l  =  1 . .  .n\  and  where  we  denoted  by  *  some 
intermediate  node  in  the  tree.  We  introduce  the  independent,  random  variables  0jhJ  such 
that  Pr(8jhj  =  //,  /)  =  /L  ,•  and  Pr(0;/ ,■  =  *)  =  !-  A,-,  It  follows  that  Z  can  be  expressed  as 


n  (  rij  mij 

7=1  U=1  7=1 

Using  again  the  independence  of  we  have  that 

n  i  mn 


Mor.eihj=iuj} 


Xj 


x  n  i("%7=iw)  - 1 


/=  !  7=1 


luyli  n  '  {co-.eihj=i,,j}} 


Let  S,  =  U/!f|  :  ®ihj  -  H’j)  and  Id  us  interpret  the  events  in  3,  as  the  set  of 


underlying  sub  -  events  generating  oj1  i.e.  ojj  =  S It  is  not  difficult  to  see  that 


Pr(ojj)  =  Pr(S  j)  =  AV(Pz,x,). 

By  defining  w,  =  AV( Pz,x,)  we  get  that  2/l|  Pr(ojj)  =  1.  Note  that  if  there  exits  an  i*  such 
that  Xj *  is  not  among  the  leaves  of  the  binary  tree  rooted  at  Z,  the  measure  of  the  event  oj, 
is  zero.  Therefore  we  have  that  Z  can  be  expressed  as 

n  n 

X  =  ^  i)Xi  -  ^  1  {6=i}Xu 

i=  1  i- 1 
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where  Pr{6  =  i)  =  w,  and  hence  it  follows  that  Z  e  'K(A)  and  consequently  co(A )  c  ’KiA). 
From  part  one  and  part  two  of  our  proof,  the  result  follows. 

□ 

Remark  3.6.1.  We  say  that  Z  is  between  X\  and X2  if  d(X\,Z)  +  d(Z,X2)  =  d(X \,X2).  For 
any  two  points  X\,X2  £  X,  the  set 

{ZeX\  d(X],Z)  +  d(Z,X2)  =  d(X UX2)}, 

is  called  metric  segment  and  is  denoted  by  [Xi,X2].  We  note  that  any  point  Z  £  X  belong¬ 
ing  to  the  convex  hull  ofX \,X2  is  on  the  metric  segment  between  X\  and  X2.  Indeed,  if 
Z  £  co({X i,X2})  then  Z  =  l(6i=i)Zi  +1{q=2}X2,  where  Pr(6  =  1)  =  A  and  Pr(6  =  2)  =  1  —  A, 
for  some  A  £  [0, 1].  It  follows  that 

d(XuZ)  +  d(Z,X2)  =  E[p(XuZ)+p(Z,X2)]  =  E[E[p(XuZ)+p(Z,X 2)]\XUX2]  = 

=  E[Ap(Xi,X2)  +  (\  -A)p(XuX2)]  =  d(XuX2). 

However,  not  every  point  belonging  to  the  metric  segment  [X\ ,  X2]  belongs  to  co{{X\  ,X2}). 
Indeed,  assume  for  example  that  X\,X2  e  {1,2}  and  consider  a  random  variable  Z  €  { 1,2} 
whose  probability  mass  function,  conditioned  on  the  values  of  Xi  and  X2  is  given  by 
Pr{Z  =  2\X\  =  2,X2  =  1)  =  A,  Pr{Z  =  l|Xi  =  2,X2  =  1)  =  1  -A,  Pr(Z  =  1|X]  =  1,X2  =  2)  =  A, 
Pr{Z  =  l|Xi  =  l,X2  =  2)  =  1  -A  andPr(Z  =  2\Xi  =  2,X2  =  2)  =  Pr(Z  =  l\Xi  =  \,X2  =  1)  = 
1,  for  some  A  *  A  £  (0, 1).  Since  Pr(Z  =  2|X,  =  2,X2  =  1)  *  Pr{Z  =  l|Xi  =  1,X2  =  2)  it 
follows  that  Z  i  co({Xi,X2}).  However  it  can  be  easily  checked  that  Z  £  [X\,X2\.  In  fact 
any  random  variable  Z  whose  probability  mass  function  conditioned  on  the  values  ofX  \ 
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and  X2  satisfies 


Yj  Pr^Z  =  =  XI ,X2  =  x2)  =  0,  J] Pr(z  =  zl^i  =  x,X2  =  X )  =  0, 

Z±X\  ±X2  ztx 

belongs  to  the  metric  segment  [Xi,X2\. 


Corollary  3.6.1.  Let  n  be  a  positive  integer  and  let  A  =  {Xi , . . .  ,Xn)  be  a  set  of  points  in 
X.  Consider  the  independent  random  variable  6  taking  values  in  the  finite  set  {1, . . .  ,n}, 
with  probability  measure  given  by  Pr{m  :  9(to)  =  i)  =  Wj,  for  some  non-negative  scalars 
Wi,  with  YIi=\wi  =  1-  Then 


coe(A)  =  \zeX\Z  =  YJ  he=i}Xi,  Lwt  >  e,  J]  W  =  1 1  • 

l  i=  1  i—  1  J 

Proof  Follows  immediately  from  Definition  3.2.5  and  Theorem  3.6.1. 


(3.48) 


□ 


Recall  the  discussion  introduced  by  Remark  3 .2. 1  on  what  we  understand  by  a  small 
enough  value  of  e. 


3.6.2  Consensus  of  Opinion  Algorithm 

We  assume  that  each  agent  of  a  group  of  n  agents  has  an  initial  opinion.  We  model 
the  set  of  opinions  by  a  finite  set  of  distinct  integers,  say  S  =  { 1 , 2, . . . ,  5}  for  some  positive 
integer  s,  where  each  element  of  S  indicates  an  opinion.  The  goal  of  the  agents  is  to  reach 
the  same  opinion  by  repeatedly  discussing  among  themselves. 

Denoting  as  before  by  k  the  time-index  and  by  G(k)  =  ( V,E(k ))  the  time  varying 
graph  modeling  the  communication  network  among  the  n  agents,  we  model  the  evolution 
of  the  opinion  of  an  agent  i  as  a  random  process  Xfk),  where  Xfk)  e  X  for  all  k  >  0.  Each 
agent  i  has  an  initial  opinion  X,(0)  =  jc?,  6  S  with  probability  pu  >  0,  with  X/=i  Pu  =  1- 
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Corollary  3.6.2.  Let  Assumptions  3.3.1  and  3.3.2  hold  for  G(k).  Given  a  small  enough, 


positive  scalar  s  <  1,  assume  that  at  every  time-slot  each  agent  i  rolls  an  imaginary 
dice  with  \Nfk)\  facets  numbered  from  1  to  \Nl(k)\,  independently  of  the  other  agents. 
The  probability  that  the  result  of  a  dice  roll  is  j  e  Nfk),  is  Wjj(k)  with  Wjj(k)  >  s  and 
Yi  jeNi(k)  Wijik)  =  1.  The  agent  i  updates  its  state  according  to  the  following  scheme.  If  the 
result  of  the  dice  roll  is  j  then  agent  i  chooses  the  opinion  of  agent  j.  We  then  have  that 
the  agents  asymptotically  agree  on  the  same  opinion,  i.e. 

lim  d(Xi(k),X j(k))  =  0,Vi,  j 

k->co 

Proof.  By  modeling  the  dice  of  agent  i  as  an  i.i.d.  random  process  Ofk)  €  {1,2, ... ,  \Nfk)\} 
such  that  Pridiik)  =  j)  =  Wjj(k)  for  all  j  e  N;(k)  and  for  all  i,  k>  0,  the  update  scheme  of 
agent  i  can  be  formally  written  as 

X,(k+\)=  J]  hem^Xjik).  (3.49) 

jeNi(k) 

However  this  implies  that  Xfk  +1)6  coE(Aj(k)),  V/,  k  and  the  result  follows  from  Theorem 
3.3.1.  □ 

3.6.3  Probabilistic  analysis  of  the  consensus  algorithm 

In  this  section  we  give  a  probabilistic  analysis  of  the  consensus  of  opinion  algorithm 
introduced  in  the  previous  section.  We  discuss  about  the  different  modes  of  convergence 
to  agreement  (from  a  probabilistic  point  of  view)  and  we  give  an  alternative  proof  of 
Corollary  3.6.2  using  purely  probability  theory  arguments.  In  addition,  we  discuss  about 
the  convergence  in  distribution  of  the  states  of  the  agents  to  a  particular  random  variable 
and  we  redefine  the  notion  of  average  consensus  from  R"  to  fit  the  metric  space  X. 
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Corollary  3.6.2  shows  that  under  the  proposed  scheme  the  distances  between  the 
states  of  the  agents  converge  to  zero.  However,  since  X  is  a  space  of  discrete  random 
variables,  we  can  say  more  about  the  modes  of  convergence  of  the  states  of  the  agents. 
Recall  that  we  defined  the  distance  between  two  points  X\,X2  e  X  as 

d(X i,X2)  =  E[p(X i,X2)]  =  Pr(X\  *  X2). 

From  Corollary  3.6.2  we  have  that 


lim  d(Xi(k),Xj(k ))  =  0, 

k— >oo 


or  equivalently 


lim  Pr(Xi(k)  ±  X.ik))  =  0. 


(3.50) 


This  says  that  the  measure  of  the  set  on  which  Xj(k)  and  Xj(k)  are  different  converges  to 
zero  as  k  goes  to  infinity,  i.e.  the  agents  asymptotically  agree  in  probability  sense.  In  what 
follows  we  show  that  in  fact  the  agents  asymptotically  agree  with  probability  one  (or  in 
almost  sure  sense). 

Given  an  arbitrary  e  >  0,  we  define  the  event 


Bk(e)  =  {m  :  max  \Xi(k)-Xj(k)\  >  e}. 
i*j 

An  upper  bound  on  the  probability  of  the  event  Bk(e)  is  given  by 

Pr(Bk(e ))  =  Pr(\JitJ{m  :  |I,« -Xj(k)\  >  ej)  < 

<  £lWPr(|XK*)-X#)l  >e)<  lii*jPr(xm  *  Xj{k)). 
From  (3.50)  and  (3.51)  we  obtain 


(3.51) 


lim  Pr(Bk(e))  =  0. 

k— »oo 
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Recall  that  by  inequality  (3.38),  d(Xfk),Xj(k ))  =  PriXjik)  ±  Xj(k)),  Vz,j  converge  at  least 
geometrically  to  zero.  Therefore 

^Pr{Bk(e))  <  oo, 
k>  0 

and  by  the  Borel-Cantelli  lemma  we  have  that 

Pr(Bk(e)  happens  infinetely  often)  =  0. 

Equivalently,  this  also  means  that 

Pri  lim  max\Xj(k)  - Xj(k)\  =  0)  =  1, 

\k->oo  itj  ) 

or  that  the  agents  asymptotically  agree  with  probability  one. 

In  the  following  we  show  that  the  same  result  can  be  obtained  by  using  purely 
probability  theory  arguments.  For  simplicity  we  assume  that  the  communication  network 
remains  constant  and  connected  and  that  the  coefficients  vv,y  from  the  agreement  scheme 
are  constant  as  well. 

Proposition  3.6.2.  Let  the  graph  modeling  the  communication  network  be  time  invariant 
and  connected  and  let  the  agents  update  their  state  according  to  the  scheme  described  in 
Corollary  3.6.2,  where  Wjj  >  0  are  assumed  constant  for  cdl  k  >  0.  We  then  have  that  the 
agents  asymptotically  agree  with  probability  one,  i.e. 

Pri  lim  max\Xi(k)-Xj(k)\  =  0)  =  1.  (3.52) 

\k->°°  itj  I 

Proof.  We  define  the  random  process  Z(k )  =  (. X\(k),X2(k),...,Xn(k ))  which  has  a  maxi¬ 
mum  of  v'  states  and  we  introduce  the  agreement  space  as 

2R  =  {( o,o, ...,o )  |  o  e  S}. 
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We  saw  earlier  that  the  state  update  dynamics  is  given  by 


Xj(k  +  1)  -  ^  ttf.(i(')=j}Xj(k), 

jeNi 

where  Pr(9j(k )  =  j)  =  wtj,  for  all  j  e  N,  and  for  all  i.  The  conditional  probability  of 
Xj(k  +  1)  conditioned  on  Xj(k),  j  e  N,  is  given  by 

Pr(Xi(k  +  1)  =  0i\Xj(k)  =  oj,  j  6  Ni)  =  J]  Wijt[0i=0j}.  (3.53) 

pNt 

It  is  not  difficult  to  note  that  Z(k)  is  a  finite  state,  homogeneous  Markov  chain.  We 
will  show  that  Z(k )  has  s  absorbing  states  and  all  other  ss  -  s  states  are  transient,  where  the 
absorbing  states  correspond  to  the  states  in  agreement  space  3K.  Using  the  independence 
of  the  random  processes  Oi(k),  the  entries  of  the  probability  transition  matrix  of  Z(k)  can 
be  derived  from  (3.53)  and  are  given  by 

Pr(Xl(k+  1)  =  oh,...,Xn(k+  1)  =  oin\Xm  =  opi,...,Xn(k)  =  oPn )  =  (3.54) 

n 

= n  Yj  vv;4iu/=^)- 

i=  1  PK 

We  note  from  (3.54)  that  once  the  process  reaches  an  agreement  state  it  will  stay  there 
indefinitely,  i.e. 

Pr(Xi(k  +  1)  =  o, . .  .,Xn(k  +  1)  =  o\Xi(k)  =  o, . .  .,Xn(k)  =  o)  =  1,  Vo  €  S, 

and  hence  the  agreement  states  are  absorbing  states.  We  will  show  next  that,  under  the 
connectivity  assumption,  the  agreement  space  is  reachable  from  any  state,  and  there¬ 
fore  all  other  states  are  transient.  We  are  not  saying  that  all  agreement  states  are  reach¬ 
able  from  any  state,  but  that  from  any  state  at  least  one  agreement  state  is  reachable.  Let 
(oi,02,...,on)  i  {R,  with  Oj  €  S,  j  =  1 . . .n  be  an  arbitrary  state.  We  first  note  that  from 
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this  state  only  agreement  states  of  the  form  (oj,Oj,...,Oj)  can  be  reached.  Given  that 
Xj(  0)  =  Oj,  we  show  that  with  positive  probability  the  agreement  vector  ( Oj,Oj , . .  ,,Oj )  can 
be  reached.  At  time  slot  one,  with  probability  wjj  agent  j  keeps  its  initial  choice,  while  its 
neighbors  to  which  it  sends  information  can  choose  Oj  with  some  positive  probability,  i.e. 
X,(l)  =  Oj  with  probability  w,y,  for  all  i  such  that  j  e  N,.  Due  to  the  connectivity  assump¬ 
tion  there  exits  at  least  one  i  such  that  j  e  Nj.  At  the  next  time-index  all  the  agents  which 
have  already  chosen  oj  keep  their  opinion  with  positive  probability,  while  their  neighbors 
will  choose  Oj  with  positive  probability.  Since  the  communication  network  is  assumed 
connected,  every  agent  will  be  able  to  choose  oj  with  positive  probability  in  at  most  n  -  1 
steps,  therefore  an  agreement  state  can  be  reached  with  positive  probability.  Hence,  from 
any  initial  state  (01,02, ...  ,on)  £  J\.,  all  agreement  states  of  the  form  (oj, oj, . . . , oj)  with 
j  =  1 ...  77  are  reachable  with  positive  probability.  Since  the  agreement  states  are  absorbing 
states,  it  follows  that  (o\ ,  02,  ■ ■ . ,  on)  £  is  a  transient  state.  Therefore,  the  probability  for 
the  Markov  chain  Z(k)  to  be  in  a  transient  state  converges  asymptotically  to  zero,  while 
the  probability  to  be  in  one  of  the  agreement  states  converges  asymptotically  to  one,  i.e. 


lim  Pr(Z(k )  4.  =  0, 

k—^00 


or  equivalently 


lim  Pr 


{J{x,(k)*Xj(k)} 

\  l±j 


=  0. 


(3.55) 


Given  an  arbitrary  e  >  0,  we  define  the  event 


Bk(e)  =  \  co  :  max|X/(fc)  -Xj(k)\  >  e 

{ 
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But  since 


Bk(e)  =  [J{\Xl(k)-Xj(k)\  >  e}  c  (J  {*,-(*)  **/*)), 


l*J 


from  (3.55)  it  follows  that 


lim  Pr{Bk{e))  <  lim  Pr 

>oo  »oo 


|J  {*,-(*)  **,•(*)} 


\l*J 


o, 


and  hence  the  agents  asymptotically  agree  in  probability  sense.  In  addition,  due  to  the  ge¬ 
ometric  decay  toward  zero  of  the  probability  Pr{Z(k)  £  Z\).  by  the  Borel-Cantelli  Lemma 


the  result  follows. 


□ 


We  discussed  above  about  the  different  modes  of  convergence  of  the  agents  to  the 
same  opinion,  but  we  said  nothing  about  where  the  states  actually  converge.  However, 
from  Corollary  3.3.1  we  know  that  there  exits  a  random  variable  X*  e  X  such  that 

lim  d(Xi(k),X*)  =  0,  Vi, 

k—*  <x> 

or  equivalently 

lim  Pr(Xi(k)±X*)  =  0,Vi, 

k—too 

which  implies  that  the  states  of  the  agents  Xj(k )  converge  to  X*  in  probability.  Still,  this 
tells  us  nothing  about  the  properties  of  X* .  In  what  follows  we  analyze  the  evolution  of 
the  probability  with  which  an  agent  i  chooses  between  the  initial  values  (opinions)  of  the 
other  agents  in  the  network.  Also,  we  focus  on  the  convergence  in  distribution  to  X*  and 
more  precisely  we  characterize  the  distribution  of  X* . 

By  defining  the  vector  Z(k )  =  (X\(k),X2(k), . . .  ,Xn(k))' ,  (3.49)  can  be  compactly 
written  as 

Z(k  +  1)  =  Q(k)Z(k),  Z( 0)  =  Z0,  (3.56) 
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where  [0(fc)L'j  =  1  [di(k)=j]  and  where  0,(k)  are  independent  random  processes  with  Pr(9j(k )  = 
j)  =  Wij(k),  Wij{k)  >  s  and  lLj&N,(k)  wij(k)  =  1.  Consequently 

z(k)  =  ra)Z(O), 

where  T(k)  =  0(C-  l)0(k-2)---0(l)0(O)  is  the  transition  matrix  of  (3.56).  It  can  be 
easily  argued  that  the  (i,j)  entry  of  T(k)  can  be  expressed  as 

[T(*)]y  =  hm=j)’  (3.51) 

where  6j(k )  are  random  processes  taking  values  in  the  discrete  set  {1,2, . . .  ,n}.  The  quan¬ 
tity  1  {ei(k)=j} 1S  updated  according  to  the  expression 

n  n 

tmk+\)=i)  =  1  {»/(*)=/} 1  lSi(*)=7l  =  tmk)=lMk)=j}’  (3-58) 

Z=1  /=  1 

where  the  second  inequality  followed  from  the  independence  of  8j(k)  and  with  1  {0f(O)=yj  = 
l(e,fO)=y)  for  ah  i,j pairs.  Since  the  events  \oj  :  6j(k)  =  l,6i(k)  =  j)  for  /  =  1 . .  .n  are  mutually 
exclusive,  is  indeed  well  defined.  The  probability  mass  function  of  6j(k)  is 

given  by 

pr(Mk)  =  j)  =  \w(k)W(k- 1)  •  •  •  wawmij, 
where  [W(k)]{j  =  Wij(k). 

It  is  not  difficult  to  observe  that  the  entries  of  Yik)  act  as  selectors  between  the 
different  entries  of  the  initial  vector  Z(0),  i.e. 

n 

7=1 

Therefore,  the  probability  for  Xj(k)  to  choose  Xj( 0)  is  given  by  the  probability  of  0,(k)  to 
choose  j,  i.e. 

Pr(X,(k)  =  XjiO))  =  Pr(6i(k)  =  j). 
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Under  Assumptions  3.3.1  and  3.3.2,  we  can  invoke  Lemmas  3  and  4  of  [34],  and 
obtain  that  there  exits  a  vector  v  with  positive  entries  summing  up  to  one,  such  that 

lim  W(k)W{k  -  1)  •  •  •  W(\)W(0)  =  lv\ 

fc— >oo 

where  1  is  the  vector  of  all  ones.  Therefore,  as  k  goes  to  infinity  the  agents  will  pick 
among  the  initial  values  Xj( 0)  with  probability  vj,  i.e. 

lim  PriXiik)  =  Xj( 0))  =  lim  PrW,(k)  =  j )  =  vj,  (3.59) 

k—>oo  k—>oo 

where  vj  is  the  jth  entry  of  vector  v.  In  particular,  if  the  matrix  W(k)  is  doubly  stochastic, 
then  by  Proposition  1  of  [34],  v  =  ^1  and  consequently 

lim  Pr(Xi(k)  =  Xj(0))  =  - .  (3.60) 

k— »oo  n 

This  leads  us  to  redefining  the  average  consensus  concept  from  R"  to  our  particular  con¬ 
vex  metric  space  X ,  i.e.  we  can  say  that  the  agents  reach  average  consensus  if  they 
asymptotically  agree  on  the  different  initial  opinions  with  the  same  probability. 

Remarkably,  from  (3.60)  it  also  follows  that  Xfk)  converge  in  distribution  to  a  ran¬ 
dom  variable  X*  given  by 

n 

x*  =  2  V=j)X;(0). 

7=1 

where  Pr(9*  -  j )  =  K  Note  that  X*  is  a  point  in  the  convex  hull  of  ]Xi(0),...,X„(0)} 
generated  by  associating  equal  weights  to  the  initial  values  X/(0).  Hence,  X*  can  be 
interpreted  as  the  ( empirical )  average  of  the  initial  values. 

Introducing  the  vector  p \k)  =  ( p'fk )),  where  plXk)  =  Pr(Xfk)  =  /)  for  some  /  e  5, 
from  (3.49)  and  from  the  independence  of  the  random  processes  6fk),  we  obtain  that  the 
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evolution  of  p l(k)  respects  the  equation 


PZ(*+  1)  =  W(k)j>l(k),  p'(0)  =  p^,  (3.61) 

where  \W(k)\jj  =  Wjjik).  Hence  we  obtain  that  there  exits  a  vector  v  with  positive  entries 
summing  up  to  one,  such  that 

lim  W(k)W(k  -  1)  •  •  •  W(l)W(0)  =  lv\ 

k—>oo 

Therefore,  by  defining  nj  =  X"=i  VjPriXjii))  =  /),  where  vy  is  the  jth  entry  of  v,  we  have 
that 

lim  Pr(Xi(k)  =  /)  =  nu  Vi, 

k—*oo 

or  equivalently  that  X,(k)  converge  is  distribution  to  a  random  variable  X*  whose  proba¬ 
bility  mass  function  is  given  by  Pr(X*  =  /)  =  717,  for  all  i.  If  in  addition  we  have  that  W (k) 
is  doubly  stochastic,  we  have  that 

1  n 

lim  PriXiik)  =  /)  =  -  V  Pr(Xj( 0)  =  /). 

k— >00  n 

7=1 

3.6.4  Numerical  example 

In  what  follows  we  consider  an  example  where  a  group  of  eight  agents  ( n  -  8)  have 
to  choose  between  two  opinions,  i.e.  S  ={1,2}.  We  assume  that  the  agents  communication 
network  is  given  by  an  undirected  circular  graph  as  in  Figure  3.2,  assumed  fixed  for  all 
time-slots. 

We  assume  that  the  agents  use  the  scheme  described  by  Corollary  (3.6.2)  for  up¬ 
dating  their  states,  i.e.  the  coefficients  w-,j  are  constant.  In  particular  we  choose  w,,  =7/9 
and  Wij-i  =  wyy+i  =  1/9  and  choose  as  initial  values  X/(0)  =  1  for  /  =  1 ... 4  and  X,(0)  =  2 
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Figure  3.2:  Undirected  circular  graph  with  eight  nodes 

for  /  =  5 ...  8  with  probability  one.  Figure  3.3  presents  an  execution  of  our  agreement 
algorithm  which  indeed  shows  that  the  agents  agree  on  the  same  opinion.  The  different 
colors  that  appear  indicates  different  agents. 


Figure  3.3:  Execution  of  the  agreement  algorithm 

Next  we  numerically  analyze  the  evolution  of  the  vector  of  distances  d(A')  =  (d {Xl  (k) ,  X}  (k) ) ) , 
V  i  t  j.  Fust  we  see  that  under  our  assumption  the  entries  of  matrix  [W(£)]~y  =  wipwjq, 
where  i  and  j  correspond  to  the  pahs  of  agents  (/,  j)  and  (p,q),  respectively  and  where 
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Wjj  define  the  probability  mass  function  of  the  random  variables  Gj(k)  as  described  in 
Corollary  3.6.2.  We  consider  the  linear  system 

d (k  +  1)  =  W(k)d(k),  d(0)  =  d(0). 

By  (3.20)  of  Lemma  3.4.1,  we  have  that  d{k)  is  an  upper  bound  of  d(k).  Figure  3.4 
presents  the  evolution  of  ||d(fc)||oo  with  time.  It  is  worth  mentioning  that  since  \|/  defined 
in  (3.43)  satisfies  the  definition  of  a  convex  structure  with  equality,  it  can  be  easily  argued 
that  (3.20)  holds  with  equality  and  therefore  the  upper  bound  d(k)  is  in  fact  dik). 


Figure  3.4:  Evolution  of  ||d(^)||oo  with  time 

We  next  analyze  the  distance  between  the  initial  points  and  the  consensus  point(s). 
Since  \|/  respects  the  definition  of  a  convex  structure  with  equality,  we  have  that 

d(Xi(k+i),xm  =  wijdixjikixm, 

jeNi 

which  is  basically  a  consensus  algorithm.  Since  the  consensus  matrix  is  doubly  stochastic 
we  know  that 

1  ” 

lim  d{Xi{k),Xt{ 0))  =  -  Yd(Xj(0),Xm 

k—>  oo  /7 

7=1 
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Figure  3.5  presents  the  evolution  of  the  distance  between  X i(k)  and  Xi(0)  for  i  -  1 ... 77. 
Considering  our  choice  for  initial  values  and  the  fact  that  //  =  8  it  is  not  difficult  to  see  that 


Figure  3.5:  The  distances  between  X,(k)  and  X\ (0)  for  /  =  1 ...  8 

-  Vrf(A'/0).X,(0))=i 

2 

J=  1 

which  is  also  what  Figure  3.5  shows. 
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Chapter  4 


Distributed  Asymptotic  Agreement  Problem  under  Markovian  Random 

Topologies 

4. 1  Introduction 

This  chapter  deals  with  the  linear  consensus  problem  for  a  group  of  dynamic  agents. 
We  assume  that  the  communication  flow  between  agents  is  modeled  by  a  (possibly  di¬ 
rected)  randomly  switching  graph.  The  switching  is  determined  by  a  homogeneous,  finite- 
state  Markov  chain,  each  communication  pattern  corresponding  to  a  state  of  the  Markov 
process.  We  address  both  the  cases  where  the  dynamics  of  the  agents  is  expressed  in  con¬ 
tinuous  and  discrete  time  and,  under  certain  assumptions  on  the  consensus  matrices,  we 
give  necessary  and  sufficient  conditions  to  guarantee  convergence  to  average  consensus 
in  mean  square  and  in  almost  sure  sense.  The  Markovian  switching  model  goes  beyond 
the  common  i.i.d.  assumption  on  the  random  communication  topology  and  appears  in 
cases  where  Rayleigh  fading  channels  are  considered.  One  of  the  goals  of  this  chapter  is 
to  show  how  mathematical  techniques  used  in  the  stability  analysis  of  Markovian  jump 
linear  systems,  together  with  results  inspired  by  matrix  and  graph  theory,  can  be  used  to 
prove  (intuitively  clear)  convergence  results  for  the  (linear)  stochastic  consensus  problem. 

Basic  notations  and  definitions:  We  denote  by  1  the  vector  of  all  ones.  If  the 
dimension  of  the  vector  needs  to  be  emphasized,  an  index  will  be  added  for  clarity  (for 
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example,  if  1  is  an  n  dimensional  vector,  we  will  explicitly  mark  this  by  using  tn  ).  Let 
x  be  a  vector  in  R".  By  av(x)  we  denote  the  quantity  av(x)  =  x'l /I'l.  The  symbols  ® 
and  ©  represent  the  Kronecker  product  and  sum,  respectively.  Given  a  matrix  A,  Null(A ) 
designates  the  nullspace  of  the  considered  matrix.  If  X  is  some  finite  dimensional  space, 
dim(X)  gives  us  the  dimension  of  X.  We  denote  by  col(A)  a  vector  containing  the  columns 
of  matrix  A. 

Let  M  be  a  set  of  matrices  and  let  A  be  some  matrix.  By  AT  we  denote  the  set 
of  the  transpose  matrices  of  A1,  i.e.  AT  =  {M'  \  M  e  A!}.  By  M®A  we  understand  the 
following  matrix  set:  M®A  =  {M®A\M  e  A!}.  By  writing  that  AM  =  A!  we  understand 
that  AM  e  M,  for  any  M  e  AT 

Let  P  be  a  probability  transition  matrix  corresponding  to  a  homogeneous,  finite 
state,  Markov  chain.  We  denote  by  P0 0  the  limit  set  of  the  sequence  [Pk)k> o,  i.e.  all 
matrices  L  for  which  there  exists  a  sequence  {tk)k>o  in  A  such  that  lim k~>ooPtk  =  L.  Note 
that  if  the  matrix  P  corresponds  to  an  ergodic  Markov  chain,  the  cardinality  of  Poo  is 
one,  with  the  limit  point  In',  where  n  is  the  stationary  distribution.  If  the  Markov  chain 
is  periodic  with  period  m,  the  cardinality  of  Poo  is  m.  Let  d(M/Pm)  denote  the  distance 
from  M  to  the  set  Pm ,  that  is  the  smallest  distance  from  M  to  any  matrix  in  Poo' 


d(M,Poo)  =  inf  \\L  —  M\\, 


where  II  •  II  is  a  matrix  norm. 


Definition  4.1.1.  Let  A  be  a  matrix  in  R'1X"  and  let  G  =  (V,  E)  be  a  graph  of  order  n. 
We  say  that  matrix  A  corresponds  to  graph  G  or  that  graph  G  corresponds  to  matrix  A 
if  an  edge  e,  j  belongs  to  E  if  and  only  if  the  (i,j)  entry  of  A  is  non-zero.  The  graph 


100 


corresponding  to  A  will  be  denoted  by  GA. 


Definition  4.1.2.  Let  s  be  a  positive  integer  and  let  -  {A,}f=1  be  a  set  of  matrices  with 
a  corresponding  set  of  graphs  Q  -  {G.4,};s=1.  We  say  that  the  graph  G-A  corresponds  to  the 
set  if  it  is  given  by  the  union  of  graphs  in  Q,  i.e. 

S 

Gpi  =  \jGAr 

i=  1 

In  this  note  we  will  use  mainly  two  type  of  matrices:  probability  transition  matrices 
(row  sum  up  to  one)  and  generator  matrices  (row  sum  up  to  zero).  A  generator  matrix 
whose  both  rows  and  columns  sum  up  to  zero  will  be  called  doubly  stochastic  generator 
matrix. 

To  simplify  the  exposition  we  will  sometimes  characterize  a  probability  transi¬ 
tion/generator  matrix  as  being  irreducible  or  strongly  connected  and  by  this  we  understand 
that  the  corresponding  Markov  chain  (directed  graph)  is  irreducible  (strongly  connected). 

Definition  4.1.3.  Let  A  e  R"x"  be  a  probability  transition/generator  matrix.  We  say  that 
A  is  block  diagonalizable  if  there  exists  a  similarity  transformation  P,  encapsulating  a 
number  of  row  permutations,  such  that  PAP'  is  a  block  diagonal  matrix  with  irreducible 
blocks  on  the  main  diagonal. 

For  simplicity,  the  time  index  for  both  the  continuous  and  discrete-time  cases  is 
denoted  by  t. 

Chapter  organization :  In  Section  4.2  we  present  the  setup  and  formulation  of  the 
problem  and  we  state  our  main  convergence  theorem.  In  Section  4.3  we  derive  a  number 
of  results  which  constitute  the  core  of  the  proof  of  our  main  result;  proof  which  is  given 
in  Section  4.4.  Section  4.5  contains  a  discussion  of  our  convergence  result. 
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4.2  Problem  formulation  and  statement  of  the  convergence  result 

We  assume  that  a  group  of  n  agents,  labeled  1  through  n ,  is  organized  in  a  communi¬ 
cation  network  whose  topology  is  given  by  a  time  varying  graph  G(t)  =  (V,  E(t)),  where  V 
is  the  set  of  n  vertices  and  E(t)  is  the  time  varying  set  of  edges.  The  graph  G(t)  has  an  un¬ 
derlying  random  process  governing  its  evolution,  given  by  a  homogeneous,  continuous  or 
discrete  time  Markov  chain  0(t),  taking  values  in  the  finite  set  { 1, . . . ,  5},  for  some  positive 
integer  s.  In  the  case  6{t)  is  a  discrete-time  Markov  chain,  its  probability  transition  matrix 
is  P  =  ( pij )  (rows  sum  up  to  one),  while  if  6{t)  is  a  continuous  time  Markov  chain,  its  gen¬ 
erator  matrix  is  denoted  by  A  =  (A,j)  (rows  sum  up  to  zero).  The  random  graph  G(/)  takes 
values  in  a  finite  set  of  graphs  Q  =  {G)}-=1  with  probability  Pr(G(t)  =  G,)  =  Pr(6{t )  =  i), 
for  i  =  1 ...  s.  We  denote  by  q  =  iq,)  the  initial  distribution  of  Oil). 

Letting  x(t)  denote  the  state  of  the  n  agents,  in  the  case  Oil)  is  a  discrete-time  Markov 
chain,  we  model  the  dynamics  of  the  agents  by  the  following  linear  stochastic  difference 
equation 

x(t  +  1)  =  D eq)x(t),  x(0)  =  xq,  (4.1) 

where  is  a  random  matrix  taking  values  in  the  finite  set  D  =  {A'P=1,  with  probability 
distribution  Pr{ =  D,)  =  Pr(8(t )  =  i).  The  matrices  D,  are  stochastic  matrices  (rows 
sum  up  to  one)  with  positive  diagonal  entries  and  correspond  to  the  graphs  G; ,  for  i  = 

1...S. 

In  the  case  0(t)  is  a  continuous-time  Markov  chain,  we  model  the  dynamics  of  the 
agents  by  the  following  linear  stochastic  equation 

dx(t)  =  Co<i)X(l)dl,  .v'(0)  =  .vq,  (4.2) 
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where  is  a  random  matrix  taking  values  in  the  finite  set  C  =  {C;}f=1,  with  probability 
distribution  Pr(Ceq)  =  Q)  =  Pr(6(t)  =  i ).  The  matrices  C,  are  generator  like  matrices  (rows 

sum  up  to  zero)  and  correspond  to  the  graphs  Gj,  for  i  -  \ _ v.  The  initial  state  ;t(0)  =  xq, 

for  both  continuous  and  discrete  models,  is  assumed  deterministic.  We  will  sometimes 
refer  to  the  matrices  belonging  to  the  sets  D  and  C  as  consensus  matrices.  The  underly¬ 
ing  probability  space  (for  both  models)  is  denoted  by  (T2,  T ,  P)  and  the  solution  process 
x(t,xo,to)  (or  simply,  x(t))  of  (4.1)  or  (4.2)  is  a  random  process  defined  on  (f \T,P).  We 
note  that  the  stochastic  dynamics  (4.1)  and  (4.2)  represent  Markovian  jump  linear  systems 
for  discrete  and  continuous  time,  respectively.  For  a  comprehensive  study  of  the  theory  of 
(discrete-time)  Markovian  jump  linear  systems,  the  reader  can  refer  to  [11]  for  example. 

Assumption  4.2.1.  Throughout  this  chapter  we  assume  that  the  matrices  belonging  to  the 
sets  T)  and  C  are  doubly  stochastic  (rows  and  columns  sum  up  to  one  and  zero,  respec¬ 
tively)  and  in  the  case  of  the  set  D  have  positive  diagonal  entries.  We  assume  also  that 
the  Markov  chain  9(t )  is  irreducible. 

Remark  4.2.1.  Consensus  matrices  that  satisfy  Assumption  4.2.1  can  be  constructed  for 
instance  by  using  a  Laplacian  based  scheme  in  the  case  where  the  communication  graph 
is  undirected  or  bcdanced  (for  every  node,  the  inner  degree  is  equcd  to  the  outer  degree) 
and  possible  weighted.  If  L,  denotes  the  Laplacian  of  the  graph  Gj,  we  can  choose  Aj  = 
I  -  eL\  and  Cj  =  -Li,  where  e  >  0  is  chosen  such  that  Aj  is  stochastic. 

Definition  4.2.1.  We  say  that  x(t  )  converges  to  average  consensus 

I.  in  the  mean  square  sense,  if  for  any  xq  g  R"  and  initial  distribution  q  =  (q\, . . .  ,qs) 
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of  B(t), 


lim  E[\\x(t) -av(xo)l\\2\  =  0. 

/— >oo 


II.  in  the  almost  sure  sense,  if  for  any  xq  e  E'!  and  initial  distribution  q  =  (c/\,..  ..qs)  of 
Bit), 


Pr(lim  ||*(f)-av(.x;o)l||)  =  1. 

t — >00 


Assumption  4.2.1  will  guarantee  reaching  average  consensus,  desirable  in  impor¬ 
tant  distributed  computing  applications  such  as  distributed  estimation  [40]  or  distributed 
optimization  [34].  Any  other  scheme  can  be  used  as  long  as  it  produces  matrices  with  the 
properties  stated  above  and  it  reflects  the  communication  structures  among  agents. 


Problem  4.2.1.  Given  the  random  processes  D(l)  and  C(t),  together  with  Assumption 

4.2.1,  we  derive  necessary  and  sufficient  conditions  such  that  the  state  vector  x(t),  evolv¬ 
ing  according  to  (4.1)  or  (4.2),  converges  to  average  consensus  in  the  sense  of  Definition 

4.2.1. 


In  the  following  we  state  the  convergence  result  for  the  linear  consensus  problem 
under  Markovian  random  communication  topology. 

Theorem  4.2.1.  The  state  vector  x(t),  evolving  according  to  the  dynamics  (4.1)  (or  (4.2)) 
converges  to  average  consensus  in  the  sense  of  Definition  4.2.1,  if  and  only  ifGr>  (or  Gq) 
is  strongly  connected. 

The  above  theorem  formulates  an  intuitively  obvious  condition  for  reaching  con¬ 
sensus  under  the  linear  scheme  (4.1)  or  (4.2)  and  under  the  Markovian  assumption  on 
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the  communication  patterns.  Namely,  it  expresses  the  need  for  persistent  communication 
paths  among  all  agents.  We  defer  for  Section  IV  the  proof  of  this  theorem  and  provide 
here  an  intuitive  and  non-rigorous  interpretation.  Since  6(t)  is  irreducible,  with  proba¬ 
bility  one  all  states  are  visited  infinitely  many  times.  But  since  the  graph  Gr>  (or  Gq)  is 
strongly  connected,  communication  paths  between  all  agents  are  formed  infinitely  many 
times,  which  allows  for  consensus  to  be  achieved.  Conversely,  if  the  graph  Gr>  (or  Gq) 
is  not  strongly  connected,  then  there  exists  at  least  two  agents,  such  that  for  any  sam¬ 
ple  path  of  6(t),  no  communication  path  among  them  (direct  or  indirect)  is  ever  formed. 
Consequently,  consensus  can  not  be  reached.  Our  main  contribution  is  to  prove  Theorem 
4.2.1  using  an  approach  based  on  the  stability  theory  of  Markovian  jump  linear  systems, 
in  conjunction  with  a  set  of  results  based  on  matrix  and  graph  theory. 

4.3  Preliminary  results 

This  section  starts  with  a  set  of  general  preliminary  results  after  which  it  continues 
with  results  characteristic  to  the  cases  where  the  dynamics  of  the  agents  is  expressed 
in  discrete  and  continuous  time.  The  proof  of  Theorem  4.2.1  is  mainly  based  on  four 
lemmas  (Lemmas  4.3.4  and  4.3.5  for  discrete-time  case  and  Lemmas  4.3.6  and  4.3.7  for 
continuous-time  case)  which  state  properties  of  some  matrices  that  appear  in  the  dynamic 
equations  of  the  first  and  second  moment  of  the  state  vector.  The  proof  of  these  lemmas 
are  based  on  results  introduced  in  the  next  subsection. 
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4.3.1  General  preliminary  results 


This  subsection  contains  the  statement  of  a  number  of  preliminary  results  that  are 
needed  in  the  proofs  of  the  auxiliary  results  corresponding  to  the  discrete  and  continuous 
time  cases  and  in  the  proof  of  the  main  theorem. 

The  next  theorem  introduces  a  convergence  result  for  an  infinite  product  of  ergodic 
matrices  whose  proof  can  be  found  in  [54]. 

Theorem  4.3.1.  ([54])  Let  s  be  a  positive  integer  and  let  {A,}?=1  be  a  finite  set  of  n  x  n 

ergodic  matrices.  Consider  a  map  r  :  IN  — >  1 1 , . . . ,  .v}  such  that  for  any  finite  sequence 

{r(f))J=y  the  matrix  product  n/=1  ^r(t)  is  ergodic.  Then,  there  exists  a  vector  c  with  non- 

negative  entries  (summing  up  to  one),  such  that: 

j 

hmn^  =  tc'  ■  (4.3) 

i=  t 

In  the  case  where  the  matrices  {A,}‘s=1  are  doubly  stochastic  as  well,  from  the  above 
theorem  we  can  immediately  obtain  the  following  corollary. 

Corollary  4.3.1.  Under  the  same  assumptions  as  in  Theorem  4.3.1,  if  in  addition  the 
matrices  in  the  set  {A(}(s=l  are  doubly  stochastic,  then 

1  1 

lim  0  A/-©  =  -11/.  (4.4) 

j — >oo  ii  n 

i=  1 

Proof.  By  Theorem  4.3.1  we  have  that 

j 

lim  F[a,.(/)  =  tc. 

'  /=i 

Since  the  matrices  considered  are  doubly  stochastic  and  ergodic  their  transposes  are  er¬ 
godic  as  well.  Hence,  by  applying  again  Theorem  4.3.1  on  the  transpose  versions  of 
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{A;}f_i,  we  obtain  that  there  exist  a  vector  d  such  that 

j  V 

|  J^KO  =^d'- 

i- 1  > 

But  since  the  stochastic  matrix  lcr  must  be  equal  to  dV,  the  result  follows.  □ 

Remark  4.3.1.  The  homogeneous  finite  state  Markov  chain  corresponding  to  a  doubly 
stochastic  transition  matrix  P  can  not  have  transient  states.  Indeed,  since  P  is  doubly 
stochastic,  the  same  is  true  for  Pl,  for  all  t>  1.  Assuming  that  there  exist  a  transient  state 
i,  then  linv-xx,  (P%  =  0  for  cdl  j,  i.e.  cdl  entries  on  column  i  converge  to  zero.  But  this 
means  that  there  exist  some  t*  for  which  'E/P1  )p  <  1  which  contradicts  the  fact  that  Pl 
must  be  doubly  stochastic.  An  important  implication  is  that  we  can  relabel  the  vertices  of 
the  Markov  chain  such  that  P  is  block  diagonalizable. 

Remark  4.3.2.  Since  the  Markov  chain  corresponding  to  a  doubly  stochastic  transi¬ 
tion/generator  matrix  can  not  have  transient  states,  the  Markov  chain  (seen  as  a  graph) 
has  a  spanning  tree  if  and  only  if  is  irreducible  ( strongly  connected). 

The  next  lemma  gives  an  upper  bound  on  a  finite  product  of  nonnegative  matrices 
in  terms  of  the  sum  of  matrices  that  appear  in  the  product.  The  proof  of  this  result  can  be 
found  in  [18]. 

Lemma  4.3.1.  [18]  Let  m  >2  be  a  positive  integer  and  let  {A,-}'"  1  be  a  set  of  nonnegative 
n  X  n  matrices  with  positive  diagonal  elements,  then 

m  m 

i=  1  i=  1 

where  y  >  0  depends  on  the  matrices  Aj,  i  =  1, . . . ,  m. 
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In  the  following  proposition  we  study  the  convergence  properties  of  a  particular 
sequence  of  matrices. 

Proposition  4.3.1.  Consider  a  matrix  Q  e  R'!X,!  such  that  \\Q\\\  <  1  and  a  set  of  matrices 
S  =  {Si,. .  .Sm},  for  some  positive  integer  m  <  n.  Assume  that  there  exist  a  subsequence 
14}  c  IX  such  that  S  is  a  limit  set  of  the  sequence  { Qtk)k>  o  and  that  for  any  S  eS,  QS  e  S, 
as  well.  Then,  S  is  a  limit  set  of  the  sequence  {Qk)k>o,  i-e. 

lim  d(Qk,S)  =  0,  (4.5) 

fc— >  oo 

where  d(Q,S )  =  mins6<5 \\Q  -  S  ||  and  ||  •  ||  is  some  arbitrary  matrix  norm. 

Proof.  Will  will  prove  (4.5)  for  the  particular  case  of  matrix  norm  one  and  the  general 
result  will  follow  from  the  equivalence  of  norms.  Pick  a  subsequence  {t',}k> o  given  by 
/'  =  tk  +  8k,  where  5k  e  IN.  It  follows  that 

diQ'cS)  =  min||£4£4  -  if  S\\ i  <  H24||i  min||^  -5||i  <  d(Qtk,S). 

SeS  SeS 

t' 

Therefore,  we  get  that  5  is  a  limit  set  for  the  sequence  and  the  result  follows  since 

we  can  make  {t'k}k>Q  arbitrary.  □ 

The  next  lemma  states  a  property  of  the  null  spaces  of  two  generator  matrices. 

Lemma  4.3.2.  Let  A  e  R'1X"  and  B  e  R”Xw  be  two  block  diagonalizable  generator  matri¬ 
ces.  Then 

Null(A  +  B)  =  Null(A)  n  Null(B). 

Proof.  Obviously,  Null(A)nNull(B)  c  Nu!!(A  +  B).  In  the  following  we  show  the  opposite 
inclusion.  Since  A  is  block  diagonalizable,  then  there  exists  a  similarity  transformation 
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T  such  that  A  =  TAT'  is  a  block  diagonal  generator  matrix  (with  irreducible  blocks).  Let 
Aj  6  i  =  l. . .  in  denote  the  irreducible  blocks  on  the  main  diagonal  of  A,  where  m 

is  the  number  of  such  blocks  and  ni  =  n.  The  nullspace  of  A  can  be  expressed  as 


f 

f  \ 

n\ 

Null(A)  =  < 

|  ai  €  R,  /  =  1 . . .  m 

V 

We  assumed  that  B  is  block  diagonalizable,  which  means  that  Gb  is  a  union  of 
isolated,  strongly  connected  subgraphs,  property  which  remains  valid  for  the  graph  cor¬ 
responding  to  B  =  TBT' ,  since  Gg  is  just  a  relabeled  version  of  Gb-  By  adding  B  to  A 
two  phenomena  can  happen:  we  can  either  leave  the  graph  G unchanged  or  we  can  cre¬ 
ate  new  connections  among  the  vertices  of  G&-  In  the  first  case,  Gg  c  G^  and  therefore 
Null(A  +  B)  =  NulKjA).  In  the  second  case  we  create  new  connections  among  the  blocks 
of  A.  But  since  all  the  subgraphs  of  B  are  strongly  connected  this  means  that  if  A,  be¬ 
comes  connected  to  Aj,  then  necessarily  Aj  becomes  connected  to  A;,  hence  A,  and  Aj 
form  an  irreducible  (strongly  connected)  new  block,  whose  nullspace  is  spanned  by  the 
vectors  of  all  ones.  Assuming  that  these  are  the  only  new  connections  that  are  added  to 
G^,  the  nullspace  of  A  +  B  will  have  a  similar  expression  to  the  nullspace  of  A  with  the 
main  difference  that  the  coefficients  ar;  and  aj  will  be  equal.  Therefore,  in  this  particular 
case,  the  nullspace  of  A  +  B  can  be  expressed  as 


Null(A  +  B) 


(  \ 
CTl  l,;i 


v  amJ-n„ ,  j 


a i  6  R,  O',  =  a,,  1=1.  ..m 
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In  general  all  blocks  A,  which  become  interconnected  after  adding  B  will  have  equal  co¬ 
efficients  in  the  expression  of  the  nullspace  of  A+  B,  compared  to  the  nullspace  of  A. 
Therefore,  Null(A  +  B)  c  Null(A),  which  means  also  that  NulKA  +B)  c  Null(A).  There¬ 
fore,  if  (A  +  B)v  =  0,  then  Av  =  0  which  implies  also  that  Bv  =  0  or  v  e  Null(B).  Hence  if 
v  6  Null(A  +  B)  then  v  e  NulKA)  n  Null(B),  which  concludes  the  proof.  □ 

In  the  next  corollary  we  present  a  property  of  the  eigenspaces  corresponding  to  the 
eigenvalue  one  of  a  set  of  probability  transition  matrices. 

Corollary  4.3.2.  Let  s  be  a  positive  integer  and  let  =  {A,}f=1  be  a  set  of  doubly  stochas¬ 
tic,  probability  transition  matrices.  Then, 

S  S 

Nulli^JAi  -  /))  =  P|  Null{A  j  -  /), 

(=i  (=i 

and  dim(Null('^Js-l(Ai  -  /)))  =  1  if  and  only  if  G&  is  strongly  connected. 

Proof.  Since  A,-,  i=  1 ...  s  are  doubly  stochastic  then  A,  - 1  are  block  diagonalizable  dou¬ 
bly  stochastic  generator  matrices.  Therefore,  by  recursively  applying  Lemma  4.3.2  v  -  1 
times,  the  first  part  of  the  Corollary  follows.  For  the  second  part  of  the  Corollary,  note 
that,  by  Corollary  3.5  of  [39],  jj  X,'=i  has  the  algebraic  multiplicity  equal  to  one,  of  its 
eigenvalue  A  =  1  if  and  only  if  the  graph  associated  to  jj  A,  has  a  spanning  tree,  or  in 
our  case  is  strongly  connected,  which  in  turn  implies  that  dim(Null(JA  j=](Ai  - 1)))  -  1  if 
and  only  if  is  strongly  connected.  □ 

The  following  Corollary  is  an  immediate  consequence  of  Corollary  3.5  of  [39]. 

Corollary  4.3.3.  A  generator  matrix  G  has  algebraic  multiplicity  equal  to  one  for  its 
eigenvalue  4  =  0  if  and  only  if  the  graph  associated  with  the  matrix  has  a  spanning  tree. 
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Proof.  Follows  immediately  from  Corollary  3.5  of  [39],  by  forming  the  probability  tran¬ 
sition  matrix  P  =  /  +  eG,  for  some  appropriate  e>0,  and  noting  that  Null(P-I)  =  Null(G). 

□ 

The  following  Corollary  is  the  counterpart  of  Lemma  3.7  of  [39],  in  the  case  of 
generator  matrices. 

Corollary  4.3.4.  Let  G  e  ][\llXn  be  a  rate  transition  matrix.  If  G  has  an  eigenvalue  A  =  0 
with  algebraic  multiplicity  equal  to  one,  then  lim,-^  eGl  =  lv',  where  v  is  a  nonnegative 
vector  satisfying  G'v  =  0  and  v'l  =  1. 

Proof.  Choose  h\  >  0  and  let  \lj.}k>o  be  a  sequence  given  by  t[  =  h\k,  for  all  k  >  0.  Then 

lim  eGtk  =  lim  eh'kG  =  lim  Pkh  , 

>oo  »oo  k—>oo  1 

where  we  defined  P/n  =  eh  1  c .  From  the  theory  of  continuous-time  Markov  chains  we 
know  that  P]n  is  a  stochastic  matrix  with  positive  diagonal  entries  and  that,  given  a  vector 
x  6  IR'\  x'  Ph ,  =  x'  if  and  only  if  x'G  =  0.  This  means  that  the  algebraic  multiplicity  of  the 
eigenvalue  A  =  1  of  P]n  is  one.  By  Lemma  3.7  of  [39],  we  have  that  lim^oo P\  -  lv^, 
where  v/n  is  a  nonnegative  vector  satisfying  P'hVh\  -  vh ,  and  v'ht  =  1.  Also  G'v/n  =  0. 
Choose  another  /?2  >  0  and  let  Ph2  =  ^G.  Similarly  as  above,  we  have  that 

lira  Pi  =  IvJ 

k—>oo  z  z 

where  v/,2  satisfy  similar  properties  as  v/; , .  But  since  both  vector  belong  to  the  nullspace 
of  G'  of  dimension  one,  then  they  must  be  equal.  Indeed  if  x  is  a  left  eigenvector  of  G, 
then  vi, ,  and  v/,2  can  be  written  as  i’i, ,  =  a\x  and  V},2  =  o,2-V'.  However,  since  1'v/,,  =  1  and 
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l'v/,1  =  1  it  follows  that  a\  =  a 2.  We  have  shown  that  for  any  choice  of  h  >  0, 


lim  eGtk  =  ehkG  =  lv', 

k-^00 

where  v  is  a  nonnegative  vector  satisfying  G'v  =  0  and  I'v  =  1,  and  therefore,  the  result 
follows.  □ 

4.3.2  Preliminary  results  for  the  case  where  the  agents’  dynamics  are 
expressed  in  discrete-time 

In  this  subsection  we  state  and  prove  a  set  of  results  used  to  prove  Theorem  4.2.1 
in  the  case  where  the  agents’  dynamics  are  expressed  in  discrete-time.  Basically  these 
results  study  the  convergence  properties  of  a  sequence  of  matrices  {<2A'h>o,  where  Q  has  a 
particular  structure  which  comes  from  the  analysis  of  the  first  and  second  moment  of  the 
state  vector  x(l). 

Lemma  4.3.3.  Let  s  be  a  positive  integer  and  let  {A,/}?  be  a  set  ofnxn  doubly  stochas- 
tic,  ergodic  matrices.  Let  P  =  (p,j)  be  a  sXs  stochastic  matrix  corresponding  to  an  irre¬ 
ducible,  homogeneous  Markov  chain  and  let  be  the  limit  set  of  the  sequence  {Pk)k> o- 
Consider  the  nsXns  dimensional  matrix  Q  whose  ( i,f)th  block  is  defined  by  Qij  =  p yAij. 
Then  ®  (4  n'^  [s  qie  Umq  set  of  the  matrix  sequence  \Qk\k>\,  i.e.: 

\imd(Qk,‘P'oc®(-U'))  =  0.  (4.6) 

k— >00  (  \n  )) 

Proof  The  proof  of  this  lemma  is  based  on  Corollary  4.3.1 .  The  (i,j)Ih  block  entry  of  the 
matrix  Qk  can  be  expressed  as 
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(4.7) 


(Q  )ij  ~  PjixPhh  •  •  •  Pik-ii^iii^iih 

Let  p°?.  be  the  (j,  i)  entry  of  an  arbitrary  matrix  in  Poo, 
{tk)k> t  c  IN  such  that  lim/f_,cw(/J,/  )/;  =  pjr 
We  have  that 


■  •  -Aik-ij- 

i.e.  there  exist  a  sequence 


(Qk)ij-pji- 11' 

j  j‘  n 


^  Yj  (Pjh-Pit-  if) 


+ 


+  X  1/7 /h  ‘  ‘  ‘ 


Pk-u-P 


ji 


1  , 
-11' 
n 


< 


<  max  • 
h>— *V- 1 


Au1  ...Aiklj  ^11 


X  Pjh---Pik-P+ 

\<h,...ik-i<s 


+ 


-11' 


where  ||  •  ||  was  used  to  denote  some  matrix  norm.  Consider  the  limit  of  the  left  hand  side 
of  the  above  inequality  for  the  sequence  {tk\k> o-  By  Corollary  4.3.1  we  know  that 


lim  Ah  ...Al  j  =  -11' 

k— >oo  1  *  1  /7 


for  all  sequences  itk  i  and  since  obviously, 


lim  V 

k — >oo  *  ^ 


*->  00  ^  Pjii-Pi^i-Pji’ 


it  results 


lim  (<2?fc)(  j  =  pji  -11'. 
k— >oo  J  n 

Therefore  ll'J  is  the  limit  set  for  the  sequence  of  matrices  {Qk)k>\  ■ 


□ 
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Lemma  4.3.4.  Let  s  be  a  positive  integer  and  consider  a  set  of  doubly  stochastic  matri¬ 
ces  with  positive  diagoncd  entries,  D  =  {£),'} f  p  such  that  the  corresponding  graph  Gr>  is 
strongly  connected.  Let  P  be  the  sXs  dimensional  probability  transition  matrix  of  an  irre¬ 
ducible,  homogeneous  Markov  chain  and  let  Poo  be  the  limit  set  of  the  sequence  {Pk)k> o 
Consider  the  nsXns  matrix  Q  whose  blocks  are  given  by  Q\j  =  ppDj.  Then  l!7) 

is  the  limit  set  of  the  sequence  of  matrices  \Qk)k>\,  i-e.: 


lim  d(&,P'oo®(-ir]\  =  0.  (4.8) 

k— >oo  y  \/7  )) 


Proof  Our  strategy  consists  in  showing  that  there  exist  a  k  e  IN,  such  that  each  (/,  /)th 
block  matrix  of  Qk  becomes  a  weighted  ergodic  matrix,  i.e  (Qk)ij  =  p^A^),  where  A{k) 
is  ergodic  and  p\)  =  ( Pk)ji ■  If  this  is  the  case,  we  can  apply  Lemma  4.3.3  to  obtain  (4.8). 
The  (i,j)th  block  matrix  of  Qk  looks  as  in  (4.7),  with  the  difference  that  in  the  current  case 


(C%=  Yj  PmPhh--Pk-,iDPn---Dk_l=p^A^  (4.9) 

where 

A'lj-  E  ah-‘t-iDiDh-Dk-i. 


with 


ah,-ik-i 


PjhPhi2---Pik-iilpf’ 


p(k)  >  0 


|  0,  otherwise 

Note  that  each  of  the  matrix  product  DjDjl .  ..Djk  l  appearing  in  A<k>,  corresponds 
to  a  path  from  node  j  to  node  i  in  k  -  1  steps.  Therefore,  by  the  irreducibility  assumption 
of  P ,  there  exists  a  k  such  that  each  matrix  in  the  set  T>  appears  at  least  once  in  one  of  the 
terms  of  the  sum  (4.9),  i.e.  {1,...,5}  c  {ii,...ik_i}.  Using  a  similar  idea  as  in  Lemma  1  in 
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[18]  or  Lemma  3.9  in  [39],  by  Lemma  4.3.1,  we  upper  bound  such  term 


D iDh  ■  ■  ■  °ik-i  —  y  =  ysD, 
i=i 


(4.10) 


where  y  >  0  depends  on  the  matrices  in  T>  and  D  is  a  doubly  stochastic  matrix  with 
positive  entries 


Since  Gr>  is  strongly  connected,  the  same  is  true  for  Gp.  Therefore,  D  corresponds 
to  an  irreducible,  aperiodic  0  has  positive  diagonal  entries)  and  hence  ergodic,  Markov 
chain.  By  inequality  (4.10),  it  follows  that  the  matrix  product  D/D(|  ...Djk]  is  ergodic. 


This  is  enough  to  infer  that  A(k>  is  ergodic  as  well,  since  is  a  result  of  a  convex  combina¬ 
tion  of  (doubly)  stochastic  matrices  with  at  least  one  ergodic  matrix  in  the  combination. 

(k*)  (k*) 

Choose  a  k*  large  enough  such  that  for  all  non-zero  p\.  ,  the  matrices  A.,  are  ergodic 
Vi,  /.  Such  k*  always  exists  due  to  irreducibility  assumption  on  P.  Then  according  to 
Lemma  4.3.3,  we  have  that  for  the  subsequence  {tm}m> o,  with  tm  =  ink* 


lim  d\Qtm,P'00®\-tl'\\  =  Q.  (4.11) 

m—>oo  y  ^77  jj 

The  result  follows  by  Proposition  4.3.1  since  ||2||i  <  1  and  since  0(^11'))  = 

□ 


Lemma  4.3.5.  Under  the  same  assumptions  as  in  Lemma  4.3.4,  if  we  define  the  matrix 
blocks  ofQ  as  Q\j  =  PjiDj  ®  Dj,  then  ,s  ^u>  limit  set  of  the  sequence  {Qk\k>\, 

i.e. 
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where  the  vector  1  above  has  dimension  n2. 


Proof.  In  the  current  setup  (4.9)  becomes: 

( Qk)ij=  Yj  Pj‘'  Pi'i2  ■ '  ’  Pik-\i(D  J  ®  D  7'X  A',  ®  D<i  )  •  ■  ■  (  A*_,  ®  D4_,  )•  (4-12) 

The  result  follows  from  the  same  arguments  used  in  Lemma  4.3.4  together  with  the  fact 
that  the  matrix  products  in  (4.12)  can  be  written  as  ( Dj®Dj)(Djl  ®Djl)...(Djk_l  ®  Dlk  , )  = 
( DjDix . . .  Djk  | ) ®  ( DjDjl . . . Djk  l)  and  with  the  observation  that  the  Kronecker  product  of 
an  ergodic  matrix  with  itself  produces  an  ergodic  matrix  as  well.  □ 

4.3.3  Preliminary  results  for  the  case  where  the  agents’  dynamics  are 
expressed  in  continuous-time 

The  following  two  lemmas  emphasize  geometric  properties  of  two  matrices  aris¬ 
ing  from  the  linear  dynamics  of  the  first  and  second  moment  of  the  state  vector,  in  the 
continuous-time  case. 

Lemma  4.3.6.  Let  s  be  a  positive  integer  and  let  C  =  {C/(,s=1  be  a  set  of  nxn  doubly 
stochastic  matrices  such  that  Gq  is  strongly  connected.  Consider  also  a  sX  s  generator 
matrix  A  =  (A,j)  corresponding  to  an  irreducible  Markov  chain  with  stationary  distribu¬ 
tion  n  =  (m).  Define  the  matrices  A  =  diagiC'J  =  1 ...  s)  and  B  =  A <g> I.  Then  A  +  B  has 
an  eigenvalue  4  =  0  with  algebraic  multiplicity  one  and  with  corresponding  right  and  left 
eigenvectors  given  by  1,„  and  (7Ti1^,7T21, nsL'n),  respectively. 

Proof.  We  first  note  that  A  +  B  is  a  generator  matrix  and  that  both  A  and  B  are  block  diag- 
onalizable  (indeed  A  has  doubly  stochastic  matrices  on  its  main  diagonal  and  B  contains 
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n  copies  of  the  irreducible  Markov  chain  corresponding  to  A).  Therefore,  A  +  B  has  an 
eigenvalue  A  =  0  with  algebraic  multiplicity  at  least  one. 

Let  v  be  a  vector  in  the  null  space  of  A  +  B.  By  Lemma  4.3.2,  we  have  that  v  6 
Null(A)  and  v  6  Null(B).  Given  the  structure  of  B,  v  must  respect  the  following  pattern 
V  =  {(u  u. . .  a  )  |  u  6  Rw}.  But  since  v  6  Null(A),  we  have  that  C'w  =  0,  i  =  1 . . .  s,  or 

s  times 

C u  =  0,  where  C  =  £;Li  C'.  Since  Gc  was  assumed  strongly  connected,  C  corresponds  to 
an  irreducible  Markov  chain,  and  it  follows  that  u  must  be  of  the  form  u  =  a  1,  for  some 
ffei  By  backtracking,  we  get  that  v  =  at,  for  some  a  £  IR  and  consequently  Null(A  + 
B)  =  span(A).  Therefore,  A  =  0  has  algebraic  multiplicity  one,  with  right  eigenvector 
given  by  1.  By  simple  verification  we  note  that  {n\Y  ,Jt2^r . .  ,nsY)  is  a  left  eigenvector 
corresponding  to  the  eigenvalue  A  =  0.  □ 

Lemma  4.3.7.  Let  s  be  a  positive  integer  and  let  C  =  {C/}?=1  be  a  set  of  nxn  doubly 
stochastic  matrices  such  that  Gq  is  strongly  connected.  Consider  also  a  sX  s  generator 
matrix  A  =  (A,j)  corresponding  to  an  irreducible  Markov  chain  with  stationary  distribu¬ 
tion  n  -  ( m ).  Define  the  matrices  A  =  diag(C'  ©  C' ,  i  =  1 ...  s)  and  B  =  A  ©  /.  Then  A  +  B 
has  an  eigenvalue  A  =  0  with  algebraic  multiplicity  one,  with  corresponding  right  and  left 
eigenvectors  given  by  \nis  and  (n  \  • . .,ns  1'2),  respectively. 

Proof.  It  is  not  difficult  to  check  that  A  +  B  is  a  generator  matrix.  Also  we  note  that 
C'  ©  C'.  -  C'  ®  I  + 1  <S>  C'  is  block  diagonalizable  since  both  C'  ©  I  and  I  ®  C'  are  block 
diagonalizable.  Indeed,  since  C,  is  doubly  stochastic  then  it  is  block  diagonalizable.  The 
matrix  C’.  ®  /  contains  n  isolated  copies  of  C'  and  therefore  it  is  block  diagonalizable. 
Also,  7®  C'  it  has  a  number  of  n  block  on  its  diagonal,  each  block  being  given  by  C',  and 
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it  follows  is  block  diagonalizable  as  well. 

Let  v  be  a  vector  in  the  nullspace  of  A+B.  By  Lemma  4.3.2,  v  6  Null(A)  and  v  6 
Null(B).  From  the  structure  of  B  we  note  that  v  must  be  of  the  form  v'  =  ,  u')'  \  u  e 

s  times 

2 

R"  .  Consequently  we  have  that  (C-  ©  C-)w  =  0,  i  =  1, . . .  s,  or  (C  ®  C )u  =  0,  where  C  = 
X;=,  C'.  Since,  Gg  is  strongly  connected,  C  is  a  generator  matrix  corresponding  to  an 
irreducible  Markov  chain.  By  applying  again  Lemma  4.3.2  for  the  matrix  C®C  =  /®C  + 
C®/,  we  get  that  u  must  have  the  form  u'  =  (u',...,u)',  where  u  e  R"  and  C u  =  0.  But 

n  times 

C  is  irreducible  and  therefore  u  -  atn,  or  u  =  atn2,  or  finally  v  =  a\n2s,  where  a  e  IR. 
Consequently,  Null(A  +  B)  =  spani  11 )  which  means  the  eigenvalue  4  =  0  has  algebraic 
multiplicity  one.  By  simple  verification,  we  note  that  (7ril'2,;r2l/2,...,7risl,2)  is  a  left 
eigenvector  corresponding  to  the  zero  eigenvalue.  □ 

4.4  Proof  of  the  convergence  theorem 

The  proof  will  focus  on  showing  that  the  state  vector  x(t )  converges  in  mean  square 
sense  to  average  consensus.  Equivalently,  by  making  the  change  of  variable  z(t )  =  x(l)  - 
av(x o)l,  we  will  actually  show  that  z.(t)  is  mean  square  stable  for  the  initial  condition 
z(0)  =  xq  -  av(xo)t,  where  z(t )  respects  the  same  dynamic  equation  as  x(t).  Using  re¬ 
sults  for  the  stability  theory  of  Markovian  jump  linear  systems,  mean  square  stability 
also  imply  stability  in  the  almost  sure  sense  (see  for  instance  Corollary  3.46  of  [11]  for 
discrete-time  case  or  Theorem  2.1  of  [15]  for  continuous-time  case,  with  the  remark  that 
we  are  interested  for  the  stability  property  to  be  satisfied  for  a  specific  initial  condition, 
rather  then  for  any  initial  condition),  which  for  us  imply  that  x(t)  converges  almost  surely 
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to  average  consensus. 


We  first  prove  the  discrete-time  case  after  which  we  continue  with  the  proof  for  the 
continuous-time  case. 

4.4.1  Discrete-time  case  -  Sufficiency 


Proof.  Let  V(t)  denote  the  second  moment  of  the  state  vector 

V(t)  =  E[x(t)x(t)Tl 


where  we  used  E  to  denote  the  expectation  operator.  The  matrix  V(t)  can  be  expressed  as 

V(t)  =  YJVl(t),  (4.13) 

i=l 

where  Vf  t)  is  given  by 


Vft)  =  E[x{t)x(t)Txm)=i]\  (4-14) 


x\(Xt)=i\  being  the  indicator  function  of  the  event  \0(l)  =  /}. 

The  set  of  discrete  coupled  Lyapunov  equations  governing  the  evolution  of  the  ma¬ 
trices  Vj(t)  are  given  by 

Vi(t+  1)  =  Y^PjiDjVjW],  i=l...s,  (4.15) 

7=1 

with  initial  conditions  VfO)  =  qjX ox^.  By  defining  ij(l)  =  col(Vj(l),  i  =  1 ...  s),  we  obtain  a 
vectorized  form  of  equations  (4.15) 


ij(t+  1)  =  rdr](t), 


(4.16) 
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where  Yci  is  an  n2 s  x  n2s  matrix  given  by 


pnDi®Dx  . 

..  ps\Ds®Ds 

/  \ 
qicol(x0x'Q ) 

Yd  = 

pisDi®Di  . 

•  •  PssDs®Ds  ^ 

and  = 

qscol(x0x'0 )  ^ 

We  note  that  Yd  satisfies  all  the  assumptions  of  Lemma  4.3.5  and  hence  we  get 

where  Poo  is  the  limit  set  of  the  matrix  sequence  {Pk)k> o-  Using  the  observation  that 


-rrW'  CO^XqXo)  =  av(x  o)2l, 
nz 


the  limit  of  the  sequence  {q(tk)}k>o,  where  {tk}k> o  is  such  that  \\mk_,00(Ptk)ij  =  p°°,  is 

J  lJ 


lim  Tj{tk)'  =  av(xo)2 


'  ?j=iPfl9A  ' 


XUpW 


v  J 

By  collecting  the  entries  of  lim^oo  t](tk)  we  obtain 


lim  Vi(tk)  =  av(x0Y 


\M  ) 


11', 


and  from  (4.13)  we  get 


lim  V(tk)  =  av(.vo)2llr 


(4.18) 


since  X,-  ,-i  /?“<?/  =  1-  By  repeating  the  previous  steps  for  all  subsequences  generating 

ltJ— 1  J1  J 

limit  points  for  {Pk}k> o  we  obtain  that  (4.18)  holds  for  any  sequence  in  IN. 
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Through  a  similar  process  as  in  the  case  of  the  second  moment  (in  stead  of  Lemma 
4.3.5  we  use  Lemma  4.3.4),  we  show  that: 

lim  E[x(t)]  =  av(x o)l.  (4.19) 

k— >oo 

From  (4.18)  and  (4.19)  we  have  that 

hm£^||x(t)-av(xo)l||2]  =  lim  trace(E[(x(t)~  av(xo)t)(x(t)~  av(xo)t)'])  = 

t — >oo  t — ^  OO 

=  lim  tracc(£’[x(t)x(t)/]  -av(xo)l£'[x(t)/]  -  av(xo)£’[x(t)]l/  +  av(xo)2ll/)  =  0. 

t—>  oo 

Therefore,  x(t)  converges  to  average  consensus  in  the  mean  square  sense,  and  conse¬ 
quently  in  the  almost  sure  sense,  as  well.  □ 

4.4.2  Discrete-time  case  -  Necessity 

Proof.  If  Gj? i  is  not  strongly  connected  then  by  Corollary  4.3.2,  dim(fYj=l  Null(Aj  -  /))  > 
1.  Consequently,  there  exist  a  vector  v  6  DjLi  Null(Aj  - 1))  such  that  v  span(l).  If  we 
choose  v  as  initial  condition,  for  every  realization  of  9(f),  we  have  that 

x(t )  =  v,  for  all  t  >  0, 

and  therefore  consensus  can  not  be  reached  in  the  sense  of  Definition  4.2.1.  □ 


4.4.3  Continuous  time  -  Sufficiency 


Using  the  same  notations  as  in  the  discrete-time  case,  the  dynamic  equations  de¬ 
scribing  the  evolution  of  the  second  moment  of  x(t)  are  given  by 


7=1 


=  1  ...5, 


(4.20) 
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equations  whose  derivation  is  treated  in  [16].  By  defining  the  vector  q(t)  =  coKV^t), i  = 
1 . . .  s),  the  vectorized  equivalent  of  equations  (4.20)  is  given  by 


where 


C\  ©  Ci  0 
0  C2  ©  Ci 


—?i(t)  =  rcT](t), 

dt 


0  q\col{xQx'Q) 

0  q2col(xox'0) 

+  A '  ®I  and  qo  = 


(4.21) 


(0  0  •••  Cs © Cs  J  y  qscol(x0x'0 )  J 

By  Lemma  4.3.7,  the  eigenspace  corresponding  to  the  zero  eigenvalue  of  rc  has  di¬ 
mension  one,  with  unique  (up  to  the  multiplication  by  a  scalar)  left  and  right  eigenvectors 
given  by  tn2s  and  l'2,7T2l'2, . .  respectively.  Since  T'  is  a  generator  matrix 

with  an  eigenvalue  zero  of  algebraic  multiplicity  one,  by  Corollary  4.3.4  we  have  that 
lim^oo  er7  =  vl',  where  V  =  ,712V , . .  .,nsV).  Therefore,  as  t  goes  to  infinity,  we 

have  that 


vl',  where  V  -  ^{n\V  ,112V , 

( 

„  ii' 

„  11' 

TTl  — j”  •  •  • 

nL 

7Tl  — 2" 

Mz 

lim  q{t)  -  : 

t — >00 

_  ll' 

_  11' 

7Tc  j 

V  *  n2 

7T  s  9 

A  ft2 

By  noting  that 


11'  .  2 
— rcol(x0x0)  =  av(x0ylni, 
n~ 


we  farther  get 


lim  q(t)  =  av(xo)2 

t — >00 


nxtn2 


”sK 2 
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Rearranging  the  columns  of  lim^oo  77(f),  we  get 


lim  Vj(t)  =  av(xo)27Tjtl' , 

t — >00 

or 

lim  V(t)  =  av(x o)2ll'. 

t — >oo 

Through  a  similar  process  (using  this  time  Lemma  4.3.6),  we  can  show  that 

lim  E[x(t)\  =  av(x  o)l. 

f — >00 

Therefore,  .v(f)  converges  to  average  consensus  in  the  mean  square  sense  and  consequently 
in  the  almost  surely  sense. 

4.4.4  Continuous  time  -  Necessity 

Follows  the  same  lines  as  in  the  discrete-time  case. 

4.5  Discussion 

In  the  previous  sections  we  proved  a  convergence  result  for  the  stochastic,  linear 
consensus  problem,  for  the  cases  where  the  dynamics  of  the  agents  were  expressed  in  both 
discrete  and  continuous  time.  Our  main  contributions  consist  of  considering  a  Markovian 
process,  not  necessarily  ergodic,  as  underlying  process  for  the  random  communication 
graph  and  of  using  a  Markovian  jump  system  theory  inspired  approach  to  prove  this  result. 
In  what  we  have  shown,  we  assumed  that  the  Markov  process  6{t)  was  irreducible  and  that 
the  matrices  Z),  and  C,  were  doubly  stochastic.  We  can  assume  for  instance  that  6{t)  is 
not  irreducible  (i.e.  0(k)  may  have  transient  states).  We  treated  this  case  in  [23]  (only  for 
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discrete-time  dynamics),  and  we  showed  that  convergence  in  the  sense  of  Definition  4.2.1 
is  achieved  if  and  only  if  the  union  of  graphs  corresponding  to  each  of  the  irreducible 
closed  sets  of  states  of  the  Markov  chain  produces  a  strongly  connected  graph.  This 
should  be  intuitively  clear  since  the  probability  to  return  to  a  transient  state  converges 
to  zero  as  time  goes  to  infinity,  and  therefore  the  influence  of  the  matrices  Dj  (or  Q), 
corresponding  to  the  transient  states,  is  canceled.  We  can  also  assume  that  Dj  and  C;  are 
not  necessarily  doubly  stochastic.  We  treated  this  case  (again  only  for  the  discrete-time 
dynamics  and  without  being  completely  rigorous)  in  [26]  and  we  showed  that  the  state 
converges  in  mean  square  sense  and  in  almost  sure  sense  to  consensus,  and  not  necessarily 
average  consensus.  From  a  technical  point  view,  the  difference  lies  in  the  fact  that  the 

/7“  x  n~  block  matrices  of  {T^)f>o  (or  ]cfrc}/>o)  no  longer  converge  to  niX\V  but  to  /qlc', 

2 

for  some  vector  c  e  R'!  with  non-negative  entries  summing  up  to  one;  vector  c  which  in 
general  can  not  be  a  priori  determined.  In  relevant  distributed  computation  application 
(such  as  distributed  state  estimation  or  distributed  optimization)  however,  convergence 
to  average  consensus  is  desired,  and  therefore  the  assumption,  that  D,  or  C;  are  doubly 
stochastic,  makes  sense. 

The  proof  of  Theorem  4.2.1  was  based  on  the  analysis  of  two  matrix  sequences 
{/<■'}, >0  and  ]T' }f>o  arising  from  the  dynamic  equations  of  the  state’s  second  moment, 
for  the  continuous  and  discrete  time,  respectively.  The  reader  may  have  noted  that  we 
approached  differently  the  analysis  of  the  two  sequences.  In  the  case  of  continuous-time 
dynamics,  our  approach  was  based  on  showing  that  the  left  and  right  eigenspaces  induced 
by  the  zero  eigenvalue  of  Tc  have  dimension  one,  and  we  provided  the  left  and  right 
eigenvectors  (bases  of  the  respective  subspaces).  The  convergence  of  {eTct}t>o  followed 
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from  Corollary  4.3.4.  In  the  case  of  the  discrete-time  dynamics,  we  analyzed  the  sequence 
{F^}f>o,  by  looking  at  how  the  matrix  blocks  of  Yrd  evolve  as  t  goes  to  infinity.  Although, 
similar  to  the  continuous-time  case,  we  could  have  proved  properties  of  Yci  related  to  the 
left  and  right  eigenspaces  induced  by  the  eigenvalue  one,  this  would  not  have  been  enough 
in  the  discrete-time  case.  This  is  because,  through  d(t),  Yci  can  be  periodic,  in  which  case 
the  sequence  {T^}f>o  does  not  converge  (remember  that  in  the  discrete-time  consensus 
problems,  the  stochastic  matrices  are  assumed  to  have  positive  diagonal  entries,  to  avoid 
the  possibility  of  being  periodic). 

In  the  case  of  i.i.d.  random  graphs  [44],  or  more  general,  in  the  case  of  strictly 
stationary,  ergodic  random  graphs  [45]  ,  a  necessary  and  sufficient  condition  for  reaching 
consensus  almost  surely  (in  the  discrete-time  case)  is  l/bCETD^)])!  <  1,  where  /h  denotes 
the  eigenvalue  with  second  largest  modulus.  In  the  case  of  Markovian  random  topology  a 
similar  condition,  does  not  necessarily  hold,  neither  for  each  time  t,  nor  in  the  limit.  Take, 
for  instance,  two  (symmetric)  stochastic  matrices  D\  and  Z)2  such  that  each  of  the  graphs 
G D ,  and  G /j, ,  respectively,  are  not  strongly  connected  but  their  union  is.  If  the  two  state 
Markov  chain  6{t )  is  periodic,  with  transitions  given  by  p\  \  =  P22  =  0  and  pi2  =  pn  =  1, 
we  note  that  /^CEID#^)])  =  1,  for  all  t  >  0.  Also  note  that  /l2(limf_,00£’[D6/(f)])  does  not 
exist  since  the  sequence  (iiCID^cr)]  )r>o  does  not  have  a  limit.  Yet,  consensus  is  reached. 
The  assumption  that  allowed  for  the  aforementioned  necessary  and  sufficient  condition  to 
hold,  was  that  6{t)  is  a  stationary  process  (which  in  turn  implies  that  E[Dg(t)\  is  constant 
for  all  t  >  0).  However,  this  is  not  necessarily  true  if  9(t)  is  a  (homogeneous)  irreducible 
Markov  chain,  unless  the  initial  distribution  is  the  stationary  distribution. 

For  the  discrete-time  case  we  can  formulate  a  result  involving  the  second  largest 
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eigenvalue  of  the  time  average  expectation  of  D^p,  i.e.  linpy-^oo  '=1  N  mn  ,  which  reflects 
the  proportion  of  time  D^p  spends  in  each  state  of  the  Markov  chain. 


Proposition  4.5.1.  Consider  the  stochastic  system  (4.1).  Then,  under  Assumption  4.2.1, 
the  state  vector  x(t)  converges  to  average  consensus  in  the  sense  of  Definition  4.2.1,  if 
and  only  if 


Ai 


'  lim  -Z-£[D“] 


/V— >oo 


V 


N 


) 


<  1. 


Proof.  The  time  average  of  £(D^n]  can  be  explicitly  written  as 


.  Z?=„E[D«(,)] 


lim 

N—>oo 


N 


^  n\Di  =  D, 
i=  1 


where  n  =  (nj)  is  the  stationary  distribution  of  6(t).  By  Corollary  3.5  in  [39],  \A20)\  <  1 
if  and  only  if  the  graph  corresponding  to  D  has  a  spanning  tree,  or  in  our  case,  is  strongly 
connected.  But  the  graph  corresponding  to  D  is  the  same  as  Gr>,  and  the  result  follows 
from  Theorem  4.2.1.  □ 


Unlike  the  discrete-time,  in  the  case  of  continuous  time  dynamics,  we  know  that  if 
there  exists  a  stationary  distribution  n  (under  the  irreducibility  assumption),  the  probabil¬ 
ity  distribution  of  6(t)  converges  to  n.  hence  the  time  averaging  is  not  necessary.  In  the 
following  we  introduce  (without  proof  since  basically  it  is  similar  to  the  proof  of  Proposi¬ 
tion  4.5.1)  a  necessary  and  sufficient  condition  for  reaching  average  consensus,  involving 
the  expected  value  of  the  second  largest  eigenvalue  of  C^p,  for  the  continuous-time  dy¬ 
namics. 

Proposition  4.5.2.  Consider  the  stochastic  system  (4.2).  Then,  under  Assumption  4.2.1, 
the  state  vector  x(l)  converges  to  average  consensus  in  the  sense  of  Definition  4.2.1,  if 
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and  only  if 


Our  analysis  provides  also  estimates  on  the  rate  of  convergence  to  average  con¬ 
sensus  in  the  mean  square  sense.  From  linear  dynamic  equations  of  the  state’s  second 
moment  we  notice  that  the  eigenvalues  of  Tf/  and  Tc  dictates  how  fast  the  second  moment 
converges  to  average  consensus.  Since  T'd  is  a  probability  transition  matrix  and  since  T' 
is  a  generator  matrix,  an  estimate  of  the  rate  of  convergence  of  the  second  moment  of  x(t ) 
to  average  consensus  is  given  by  the  second  largest  eigenvalue  (in  modulus)  of  Tj,  for 
the  discrete-time  case,  and  by  the  real  part  of  the  second  largest  eigenvalue  of  rc,  for  the 
continuous  time  case. 
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Chapter  5 


Distributed  Consensus-Based  Linear  Filtering 
5.1  Introduction 

In  this  chapter  we  address  the  consensus-based  distributed  linear  filtering  problem 
as  well.  We  assume  that  each  agent  updates  its  (local)  estimate  in  two  steps.  In  the  first 
step,  an  update  is  produced  using  a  Luenberger  observer  type  of  filter.  In  the  second 
step,  called  consensus  step ,  every  sensor  computes  a  convex  combination  between  its  lo¬ 
cal  update  and  the  updates  received  from  the  neighboring  sensors.  Our  focus  is  not  on 
designing  the  consensus  weights,  but  on  designing  the  filter  gains.  For  given  consensus 
weights,  we  will  first  give  sufficient  conditions  for  the  existence  of  filter  gains  such  that 
the  dynamics  of  the  estimation  errors  (without  noise)  are  asymptotically  stable.  These 
sufficient  conditions  are  also  expressible  in  terms  of  the  feasibility  of  a  set  of  linear  ma¬ 
trix  inequalities.  Next,  we  present  a  distributed  (in  the  sense  that  each  sensor  uses  only 
information  available  within  its  neighborhood),  sub-optimal  filtering  algorithm,  valid  for 
time  varying  topologies  as  well,  resulting  from  minimizing  an  upper  bound  on  a  quadratic 
cost  expressed  in  terms  of  the  covariances  matrices  of  the  estimation  errors.  In  the  case 
where  the  matrices  defining  the  stochastic  process  and  the  consensus  weights  are  time 
invariant,  we  present  sufficient  conditions  such  that  the  aforementioned  distributed  al¬ 
gorithm  produces  filter  gains  which  converge  and  ensure  the  stability  of  the  dynamics 
of  the  covariances  matrices  of  the  estimation  errors.  We  will  also  present  a  connection 
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between  the  consensus-based  linear  filter  and  the  linear  filtering  of  an  appropriately  de¬ 
fined  Markovian  jump  linear  system.  More  precisely,  we  show  that  if  the  aforementioned 
Markovian  jump  linear  system  is  (mean  square)  detectable  then  the  stochastic  process  is 
detectable  as  well  under  the  consensus-based  distributed  linear  filtering  scheme.  Finally 
we  show  that  the  optimal  gains  of  a  linear  filter  for  the  state  estimation  of  the  Markovian 
jump  linear  system  can  be  used  to  approximate  the  optimal  gains  of  the  consensus-based 
distributed  linear  filtering  strategy. 

Chapter  structure:  In  Section  5.2  we  describe  the  problems  addressed  in  this  chap¬ 
ter.  Section  5.3  introduces  the  sufficient  conditions  for  detectability  under  the  consensus- 
based  linear  filtering  scheme  together  with  a  test  expressed  in  terms  of  the  feasibility  of  a 
set  of  linear  matrix  inequalities.  In  Section  5.4  we  present  a  sub-optimal  distributed  con¬ 
sensus  based  linear  filtering  scheme  with  quantifiable  performance.  Section  5.5  makes 
a  connection  between  the  consensus-based  distributed  linear  filtering  algorithm  and  the 
linear  filtering  scheme  for  a  Markovian  jump  linear  system. 

Notations  and  Abbreviations:  We  represent  the  property  of  positive  (semi-positive) 
definiteness  of  a  symmetric  matrix  A,  by  A  >  0  (A  >  0).  By  convention,  we  say  that  a  sym¬ 
metric  matrix  A  is  negative  definite  ( semi-definite )  if  -A  >  0  (-A  >  0)  and  we  denote  this 
by  A  <  0  (A  <  0).  By  A  >  B  we  understand  that  A  —  B  is  positive  definite.  Given  a  set  of 
square  matrices  {A;}^,  by  diag(Ai,i  =  1 . . .  N)  we  understand  the  block  diagonal  matrix 
which  contains  the  matrices  A;’s  on  the  main  diagonal.  We  use  the  abbreviations  CBDLF, 
MJLS  and  LMI  for  consensus-based  linear  filter(ing),  Markovian  jump  linear  system  and 
linear  matrix  inequality,  respectively. 
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Remark  5.1.1.  Given  a  positive  integer  N,  a  set  of  vectors  {xi\N=y  a  set  of  non-negative 
scalars  {pi}^=l  summing  up  to  one  and  a  positive  definite  matrix  Q,  the  following  holds 


(  N  y 

2  Pi*i 

v  i=  I  > 


(  N  \  N 


Q 


Pixi  —  Pix'i  Qxi- 


\i=  1  J  i=  1 

Remark  5.1.2.  Given  a  positive  integer  N,  a  set  of  vectors  {xi}N=v  a  set  of  matrices  { A  / }  , 
and  a  set  of  non-negative  scalars  {Pi}^=l  summing  up  to  one,  the  following  holds 

(  N  \(  N 


Y  N 


^ PiAiXi  ^ PiAiXi  <  'yjpiAiXjx'A'. 


(5.1) 


V  i=  l 


A  /=  l 


)  i=  l 


5.2  Problem  formulation 


We  consider  a  stochastic  process  modeled  by  a  discrete-time  linear  dynamic  equa¬ 
tion 


x(k  +  1)  =  A(k)x(k)  +  w(k),  ^(0)  =  xq. 


(5.2) 


where  x{k)  6  R"  is  the  state  vector  and  w(k)  6  R"  is  a  driving  noise,  assumed  Gaussian 
with  zero  mean  and  (possibly  time  varying)  covariance  matrix  £w(fc).  The  initial  condition 
xq  is  assumed  to  be  Gaussian  with  mean  po  and  covariance  matrix  So-  The  state  of  the 
process  is  observed  by  a  network  of  N  sensors  indexed  by  i,  whose  sensing  models  are 
given  by 

yfk)  =  Cj{k)x(k)  +  vfk),  i  =  1 . . .  N,  (5 .3) 


where  yfk)  6  Rr'  is  the  observation  made  by  sensor  i  and  vfk)  e  Rr'  is  the  measurement 
noise,  assumed  Gaussian  with  zero  mean  and  (possibly  time  varying)  covariance  matrix 
2V/(k).  We  assume  that  the  matrices  {2V/.(A:)}(^ x  and  Iv,f/c)  are  positive  definite  for  k  >  0 
and  that  the  initial  state  .vq,  the  noises  v,(k)  and  w(k)  are  independent  for  all  k  >  0.  For 
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later  reference  we  also  define  'L\!2(k).  l}J2(k),  where  2V;(k)  =  l}J2(k)'L\!2(k)  and  = 

tl\k)t!2(k)  . 

The  set  of  sensors  form  a  communication  network  whose  topology  is  modeled  by  a 
directed  graph  that  describes  the  information  exchanged  among  agents.  The  goal  of  the 
agents  is  to  (locally)  compute  estimates  of  the  state  of  the  process  (5.2). 

Let  Xi(k)  denote  the  state  estimate  computed  by  sensor  i  at  time  k  and  let  e,(k)  denote 
the  estimation  error,  i.e.  e,(k)  =  x(k)  -  x,{k).  The  covariance  matrix  of  the  estimation  error 
of  sensor  i  is  denoted  by  2,(k)  =  .£,[e^■(k)e^■(k),],  with  2,(0)  =  2o- 

The  sensors  update  their  estimates  in  two  steps.  In  the  first  step,  an  intermediate 
estimate,  denoted  by  <fi(k),  is  produced  using  a  Luenberger  observer  filter 

<Pi(k)  =  A(k)Hk)  +  Li(k)(yi(k ) - CmW)),  i  =  (5 .4) 

where  L,(k)  is  the,  filter  gain. 

In  the  second  step,  the  new  state  estimate  of  sensor  i  is  generated  by  a  convex 
combination  between  <Pi(k)  and  all  other  intermediate  estimates  within  its  communication 
neighborhood,  i.e. 

N 

Xi(k  +\  )  =  YJPi ; j(k )(fi j(k),  i  =  1 . . .  N,  (5.5) 

7=1 

where  Pij(k)  are  non-negative  scalars  summing  up  to  one  (  Z/L]  Pij(k)  =  1),  and  Pifk)  =  0 
if  no  link  from  j  to  i  exists  at  time  k.  Having  Pi  jik)  dependent  on  time  accounts  for  a 
possibly  time  varying  communication  topology. 

Combining  (5.4)  and  (5.5)  we  obtain  the  dynamic  equations  for  the  consensus  based 
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distributed  filter: 


N 

Xi(k+  1)  =  ^  pij(k)  \A(k)Xj(k)  +Lj(k ) (>’ j(k)  -  Cj(k)Xj(k))] ,  i  =  1 . . . /V.  (5.6) 

y=i 

From  (5.6)  the  estimation  errors  evolve  according  to 

N 

6i(k+  1)  =  £  PlJ(k)  [ (Aik)  -  LjikjCjik)) 6j(k)  +  w(k)  -  Lj{k)Vj(k) | ,  i  =  1  ...N.  (5.7) 

y=l 

We  define  the  aggregate  vectors  of  estimates,  measurements,  estimation  errors,  driv¬ 
ing  noise  and  measurements  noise,  respectively 

m' =(my,...,xN(kn 


y  (k)'  =  (yi(k)',...,yN(k )'), 

e(k)'  =  (6i  (ky,...,eN(k)'), 
w (k)'  =  (w(k)' , . . .  ,w(k)'), 


v(k)'  =  (yi(ky,...,vN(k)'), 


and  the  following  block  matrices 

( 


JUk)  = 


A  (k)  Onxn 
Onxn  A(k) 


Onxn 

Onxn 


e  R' 


nNxnN 


On 


O,, 


A(k ) 


f 

\ 

f 

\ 

cm 

OriXn 

OyNxn 

L\{k) 

OnXr\ 

OnXru 

C(k)  = 

Or\Xn 

C2(k)  •• 

OyNxn 

e  HrxnN,  £(k)  A 

OnXr\ 

L2ik)  •• 

OnXru 

OnXn 

V 

Or^Xn 

•  cN(k)  J 

OnXr\ 

OnXr2 

•  LN(k) 

/ 

e  R1 


nNxr 
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where  r  -  Y^=\  ri-  The  dynamics  (5.6)  and  (5.7)  can  be  compactly  written  as 

x(k  +  1)  =  P(k)ft(k)x(k)  +P(k)£(k)[y(k) -C(k)x(k)],  (5.8) 

e(k  +  1)  =  rP(k)\Mk)  -  £(k)C(k)\e(k)  +  w (k)  - P(k)£(k)\((k),  (5.9) 

where  P(k)  -  P(k)®I  and  P(k)  =  ( pij(k ))  is  a  stochastic  matrix,  with  rows  summing  up  to 
one. 

Definition  5.2.1.  (distributed  detectability )  Assuming  that  A(k),  C(k)  =  {Ci(k)}f=l  and 
p(k )  =  {pij(k)}^j_  |  are  time  invariant,  we  say  that  the  linear  system  (5.2)  is  detectable 
using  the  CBDLF  scheme  (5.6),  if  there  exist  a  set  of  matrices  L  =  {L,}^  such  that  the 
system  (5.7),  without  the  noise  inputs,  is  asymptotically  stable. 

We  introduce  the  following  finite  horizon  quadratic  filtering  cost  function 

K  N 

7^(L(-))  =  ^^i?[||e;(fc)||2],  (5.10) 

k= 0  /=  1 

where  by  L(-)  we  understand  the  set  of  matrices  L(-)  =  {Lfk),k  =  0...K-  1}^.  The 
optimal  filtering  gains  represent  the  solution  of  the  following  optimization  problem 

L*(-)  =  argmin/*(L(-)).  (5.11) 

L(-) 

Assuming  that  A(k),  C(k)  =  {Ci(k)}f=v  I \w(k),  2.v(k)  =  jl,,(/c)}  and  p(k)  =  {pij(k)}fj=l  are 
time  invariant,  we  can  also  define  the  infinite  horizon  filtering  cost  function 

1  N 

Too(L)  =  lim  -JK( L)  =  lim  V  E[\\ei(k)\\2],  (5.12) 

K-^oo  A  >oo 

1=1 

where  L  =  {L,}^  is  the  set  of  steady  state  filtering  gains.  By  solving  the  optimization 
problem 

L*  =  argmin/°°(L),  (5.13) 
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we  obtain  the  optimal  steady- state  filter  gains. 

In  the  next  sections  we  will  address  the  following  problems: 

Problem  5.2.1.  (Detectability  conditions)  Under  the  above  setup,  we  want  to  find  condi¬ 
tions  under  which  the  system  (5.2)  is  detectable  in  the  sense  of  Definition  5.2.1. 

Problem  5.2.2.  (Sub-optimal  scheme  for  consensus  based  distributed  filtering)  Ideally, 
we  would  like  to  obtain  the  optimal  filter  gains  by  solving  the  optimization  problems 
(5.11)  and  (5.13),  respectively.  Due  to  the  complexity  of  these  problems,  we  will  not 
provide  the  optimal  filtering  gains  but  rather  focus  on  providing  a  sub-optimal  scheme 
with  quantifiable  performance. 

Problem  5.2.3.  (Connection  with  the  linear  filtering  of  a  Markovian  jump  linear  system ) 
We  make  a  parallel  between  the  consensus-based  distributed  linear  filtering  scheme  and 
the  linear  filtering  of  a  particular  Markovian  jump  linear  system. 

5.3  Distributed  detectability 

We  start  with  a  toy  example  motivating  our  interest  in  the  distributed  detectability 
problem  under  the  CBDLF  scheme.  Let  us  assume  that  no  single  pair  (A,C,)  is  detectable 
in  the  classical  sense,  but  the  pair  (A,C)  is  detectable,  where  C'  -  (C[, . . . ,C'N).  In  this 
case,  we  can  design  a  stable  (centralized)  Luenberger  observer  filter.  The  question  is, 
can  we  obtain  a  stable  consensus-based  distributed  filter?  As  the  following  example  will 
show,  in  general  this  is  not  true.  That  is  why  it  is  important  to  find  conditions  under  which 
the  CBDLF  can  produce  stable  estimates. 
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Example  5.3.1.  ( Centralized  detectable  but  not  distributed  detectable )  Consider  a  linear 

dynamics  as  in  (5. 2-5. 3),  with  two  sensors,  where 

(  \ 

10  0 

A  =  ,  C i  =  (  1  o  )  and  C2  =  (  ()  1  )• 

0  10 

V 

Obviously,  the  pairs  (A,C i)  and  (A,C 2)  are  not  detectable  while  the  pair  (A,C)  is, 

where  C'  -  (C'  Cf )  is.  Let  L'  =  (l\  I2)  and  L'  =  (I3  I4).  For  this  example,  the  matrix  that 

dictates  the  stability  property  of  (5.9)  is  given  by 

( 

^n(lO-Zi)  0  10pi2  -pnh 

-pnh  10pn  0  £>12(10-/4) 

£>2l(10-/i)  0  10/2  22  ~P2lh 

~P2lh  10/221  0  £>22(10-/4) 

For  £>n  =  0.9,  £>12  =  0.1,  £>21  =  0.7  and  £>22  =  0.3,  the  characteristic  polynomial  of  the 
above  matrix  is  given  by 

q(s)  =  s4  +  q3(li,li)s3  +  q2(h,U,hh)s2  +  q\(h,U)  +  qo(h,U), 

where 

qi(h,h)  =  -24  +  0.9/i  +0.3/4, 
q2(hd4d2h)  =  — O.O7/2/3  —  5.6/4  +  184—  12.8/i  +  O.27/1/4, 
q\(l\,lf)  —  3O/4  —  480  —  2.4/i  ^4  +  42/i, 

<7o(/ 1 , 1 4)  =  —40/ 1  —  4O/4  +  4/i  I4  +  400. 

Let  Ai(l\,l4,hh)  denote  the  eigenvalues  ofJ\.  We  define  AmaxihMihh)  =  max,|/l,(/|  ,/4,/2/3)|. 
The  system  (5.2-53)  is  not  detectable  in  the  sense  of  Definition  5.2.1  if  Amax(h,l4,l2h)  >  1 
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for  all  values  ofl\,  h  and  of  the  product  hh-  We  introduce  also  the  quantity  ^maxihh)  = 
rnin/j  j4A  max  (h,U,hh)- 


Figure  5. 1 :  The  evolution  of  ^ax{hh) 

From  Figure  5.1,  we  note  that  min/2/3  =  4.498,  which  shows  that,  for  the 

given  consensus  weights,  and  matrices  A,  Ci  and  C2,  there  are  no  values  for  1\,  I2,  h  and 
I4,  such  that  (5.9)  can  be  made  asymptotically  stable. 

The  CBDLF  (5.8)  uses  only  one  consensus  step  and  we  have  seen,  through  Example 
5.3.1,  that  in  general  this  does  not  guarantee  stable  estimates,  even  in  the  case  where  the 
pah  ( A,C)  is  detectable.  However,  as  the  next  proposition  suggests,  stable  estimates 
might  be  achieved  if  a  large  enough  number  of  consensus  steps  is  used,  i.e.  we  set  'P(k)  = 
P(kf  <S>  I,  for  some  positive  integer  value  77,  large  enough. 

Proposition  5.3.1.  Consider  the  linear  dynamics  (5.2)-(5.3).  Assume  that  in  the  CBDLF 
scheme  (5.6),  we  have  p,j  =  ^  and  that  .r,(0)  =  xo,for  all  i,j=  1  ...N.  If  the  pair  (A,C)  is 
detectable,  then  the  system  (5.2)  is  detectable  as  well,  in  the  sense  of  Definition  5.2.1. 
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Proof.  Rewrite  the  matrix  C  as 


N 

c  =  I> 

i=l 

where  C\  =  (OnXri . . .  OnXn  C\  Onxrj+1 . . .  OnxrN)-  Ignoring  the  noise,  we  define  the  mea¬ 
surements 

}'i(k)  =  Cix(k), 

which  are  equivalent  to  the  ones  in  (5.3).  Under  the  assumption  that  plj  -  jj  and  x,  =  A'o 
for  all  i,  j  =  1 . . .  N,  it  follows  that  the  estimation  errors  respect  the  dynamics 

1  N 

e(k+l)  =  -J](A-LiCi)e(k).  (5.14) 

i=  1 

Setting  Lj  =  NL  for  i  =  1 ...  N,  it  follows  that 

e(k+l)  =  (A-LC)e(k). 

Since  the  pair  (A,C)  is  detectable,  there  exists  a  matrix  L  such  that  A-LC  has  all  eigen¬ 
values  within  the  unit  circle  and  therefore  the  dynamics  (5.14)  is  asymptotically  stable, 
which  implies  that  (5.2)  is  detectable  in  the  sense  of  Definition  5.2.1.  □ 

The  previous  proposition  tells  us  that  if  we  achieve  (average)  consensus  between 
the  state  estimates  at  each  time  instant,  and  if  the  pair  (A,  C)  is  detectable  (in  the  classical 
sense),  then  the  system  (5.2)  is  detectable  in  the  sense  of  Definition  5.2.1.  However, 
achieving  consensus  at  each  time  instant  can  be  time  and  numerically  costly  and  that  is 
why  it  is  important  to  find  (testable)  conditions  under  which  the  CBDLF  produces  stable 
estimates. 

Lemma  5.3.1.  (sufficient  conditions  for  distributed  detectability )  If  there  exists  a  set  of 
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(5.15) 


symmetric,  positive  definite  matrices  {Qi)^=l  and  a  set  of  matrices  {LfN=l  such  that 

N 

Qi  =  £  p/A  -  LjCjf  Qj(A  -  LjCj)  +  S „  i=  I...N, 

7=1 

for  some  positive  definite  matrices  then  the  system  (5.2)  is  detectable  in  the  sense 

of  Definition  5.2.1. 

Proof.  The  dynamics  of  the  estimation  error  without  noise  is  given  by 

N 

efk  +  1)  =  YjPiM-LjCfiefik),  i=\...N.  (5.16) 

7=1 

In  order  to  prove  the  stated  result  we  have  to  show  that  (5.16)  is  asymptotically 
stable.  We  define  the  Lyapunov  function 

N 

V(k)  =  ^  Xj(k)'  QjXj(k), 
i 

and  our  goal  is  to  show  that  V(k  +  1)  -  V(k)  <  0  for  all  k  >  0.  The  Lyapunov  difference  is 
given  by 

N  (  N  V  (  N  ' 

V(k+\)-V(k)  =  YJ  YjPijiA-LjCfiefik)  Qi  J^PifiA- LjCfiefik)  -  efk)'  T  Q.efk)  < 
1=1  V7=l  )  V7=l  , 

N  (  N  ' 

-Z  Z  P^QA’iA  -  hff'QfA  -  Lffiefk)  -  efk)'  Qj  efk),  (5.17) 

1=1  V7=l  , 

where  the  inequality  followed  from  Remark  5.1.1.  By  changing  the  summation  order  we 

can  further  write 

N  (  N  ' 

V(k+\)-V(k)<YJ^k)'  £  Pji(A  -  LjCjfQjiA  -  LjCj)  -  Q,  efk). 

1=1  V7=l  , 

Using  (5.15)  yields 

N 

V(k  +  1)  -  V(k)  <-J^  efkfs  iefk ) 

i=  1 
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From  the  fact  that  {S  j}1?^  are  positive  definite  matrices,  we  get 

V(k+  l)-V(k)<0, 

which  implies  that  (5.16)  is  asymptotically  stable.  □ 

The  following  result  relates  the  existence  of  the  sets  of  matrices  {<2/}^  and  {Lj}N=i 
such  that  (5.15)  is  satisfied,  with  the  feasibility  of  a  set  of  linear  matrix  inequalities  (LMI). 

Proposition  5.3.2.  (distributed  detectability  test)  The  linear  system  (5.2)  is  detectable  in 
the  sense  of  Definition  5.2.1  if  the  following  linear  matrix  inequalities,  in  the  variables 
{Xi ;}^1  and  {F,}^,  are  feasible 

Xi  yfpTi(ArX{-C\Y[)  yfpfi(A'X2-C'2Y'2} )  •••  yfpWi(ArXN-C'NY’N) 

VpIJ( X{A-YxC\ )  Xy  0  •••  0 

x[pfi(X2A  -  Y2C2)  0  X2  •••  0 

sJPm(XNA  -  YnCn )  0  0  •  •  •  XN 

(5.18) 

for  i  =  1  ...N  and  where  {X,}^  are  symmetric.  Moreover,  a  stable  CBDLF  is  obtained 
by  choosing  the  filter  gains  as  Li  =  XT 1  Yifor  i  =  1...N. 

Proof.  First  we  note  that,  by  the  Schur  complements  Lemma,  the  linear  matrix  inequali¬ 
ties  (5.18)  are  feasible  if  and  only  if  there  exist  a  set  a  symmetric  matrices  and  a 

set  of  matrices  {F,  }^r  such  that 

N 

Xi  -  Yj{X:A  -  YjCj)'XJ 1  (XjA  -  YjCj )  >  0,  Xi  >  0 

7=1 

for  all  i  -  1  ...N.  We  further  have  that, 

N 

X,  -ffA-X-y  YjCjYXjiXjA  -  X- 1  YjCj)  >0,X>0 
7=1 
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By  defining  L,  =  X;  1  Yj,  it  follows  that 

N 

X,  -  £(A  -  LjC j)'Xj(A  -  LjCj )  >  0,  Xi  >  0. 

7=1 

Therefore,  if  the  matrix  inequalities  (5.18)  are  feasible,  there  exists  a  set  of  positive  defi¬ 
nite  matrices  {X/}^  and  a  set  of  positive  matrices  {S;}^,  such  that 

N 

7=1 

By  Lemma  5.3. 1,  it  follows  that  the  linear  dynamics  (5.7),  without  noise,  is  asymptotically 
stable,  and  therefore  the  system  (5.2  is  detectable  in  the  sense  of  Definition  5.2.1.  □ 

5.4  Sub-Optimal  Consensus-Based  Distributed  linear  Filtering 

Obtaining  the  closed  form  solution  of  the  optimization  problem  (5. 1 1)  is  a  challeng¬ 
ing  problem,  which  is  in  the  same  spirit  as  the  decentralized  optimal  control  problem.  In 
this  section  we  provide  a  sub-optimal  algorithm  for  computing  the  filter  gains  of  the  CB- 
DLF,  with  quantifiable  performance  in  the  sense  that  we  compute  a  set  of  filtering  gains 
which  guarantee  a  certain  level  of  performance  with  respect  the  quadratic  cost  (5.10). 

5.4.1  Finite  Horizon  Sub-Optimal  Consensus-Based  Distributed  Linear 
Filtering 

The  sub-optimal  scheme  for  computing  the  CBDLF  gains  results  from  minimizing 
an  upper  bound  of  the  quadratic  filtering  cost  (5.10).  The  following  proposition  gives 
upper-bounds  for  the  covariance  matrices  of  the  estimation  errors. 
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Proposition  5.4.1.  Consider  the  following  coupled  difference  equations 


N 

Qi(k  +  1)  =  X  Pifk)  \{A(k)  - Lfk)Cfk))  Qjik) (A{k)  -  Lfk)Cfk ))'  + 

/=  1 

+Lfk)^fk)Lfkj\+Xw(k),  (5.19) 

with  0/(0)  =  2/(0),  for  i  =  \  ...N.  The  following  inequality  holds 

m)  <  Qi(k),  (5.20) 


for  i  =  1 . . .  N  and  for  all  k>  0. 

Proof  Using  (5.7),  the  matrix  2,(/z  +  1)  can  be  explicitly  written  as 


Z/(fc+l)  =  £[e;(fc+l)'6/(fc+l)]  = 


N 


N 


^  Pifk)  (A(k)  -  Lj(k)Cj(k))  efk)  +  w(k)  -  ^  pifk)Lfk)vfk) 


LI  7=1 

<  N 


7=1 


N 


Yj  Pifk)  (a  (k)  -  L  j(k)C  fk ))  efk)  +  w{k)  -  ^  pt  fk)L  fk)v  fk) 
(7=1  7=1 


7  J 


Using  the  fact  that  the  noises  w{k)  and  vfk)  have  zero  mean,  and  they  are  independent 
with  respect  to  themselves  and  the  initial  state,  for  every  time  instant,  we  can  further 


write 


I.i(k+l)  =  E 


(  N 


Yj  Pifk )  (A(k)  ~  Lfk)Cfk ))  efk)  ^  Pifk)  (A(k)  -  Lfk)C fk))  efk) 
L(7=i  )  (7=1 


N 


7J 


+ 


+E 


N 


(  N 


Yj  Pi  7 (k)L fk)v  fk )  Yj  Pij(k)Lj(k)v  fk ) 

L(7=l  )  (7=1 


7J 


+  Xw(k). 


By  Remark  5.1.2,  it  follows  that 


'  n  y  (  n 

^  Pifk)  (A(k)  -  Lfk)Cfk))  efk)  ^  Pifk)  (A(k)  -Lfk)Cfk))  efk) 
1(7=1  )  (7=1 


/J 


< 
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and 


< 


N 

Yj  Pij(k)  (Mk)  -  Lj(k)Cj(k))  Xj(k)  (A(k)  -  Lj(k)Cj(k))' 


7=1 


(  N  Y(  N  \\  N 

E  2 Pij(k)Lj{k)vj{k )  YjPij(k)Lj(k)Vj(k)  <  ^  pij(k)LJ(k)'LVj(k)Lj(k)' ,  i=  \  ...N. 

A 7=1  /  V7=l  /J  7=1 

From  the  previous  two  expressions,  we  obtain  that 

N 

m  +  1)  <  2  Pij(k)  (A(k)  - Lj(k)Cj(kj)  Zj(k)  (A(k)  - Lj(k)Cj(k. ))'  + 

7=1 

N 

+  £  pij(k)Lj(k)'ZVj(k)Lj(k)  +  Xw(k) 

7=1 

We  prove  (5.20)  by  induction.  Assume  that  £,-(&)  <  <2;(fc)  f°r  all  /  =  1 . . . N.  Then 
(Mk)  -  Li(k)Ci(k))  Uk)  (A(k)  -  L,(k)C,(k)Y  <  (A(k)  -  L,(k)C,(k ))  Q,(k)  (A(k)  -  L,(k)C,(k)y , 
and 

Li(k)Xi(k)Li(ky  <  Li(k)Qi(k)Li(ky,  i  =  1 ...  A. 

and  therefore 

Xi(k+l)<Qi(k+l),  i=l...N. 

□ 


Defining  the  finite  horizon  quadratic  cost  function 


MU-))  =  XL  Z£i  *KGi(*)),  <5-21) 

the  next  Corollary  follows  immediately. 

Corollary  5.4.1.  The  following  inequalities  hold 

JK(L(-))  <  JK(L(-)),  (5.22) 
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and 


limsup^J^(L)  <  limsup-^J^CL)  (5.23) 

K^>oo  K  K-*oo  K 

Proof.  Follows  immediately  from  Proposition  5.4.1.  □ 

In  the  previous  Corollary  we  obtained  an  upper  bound  on  the  filtering  cost  function. 
Our  sub-optimal  consensus  based  distributed  filtering  scheme  will  result  from  minimizing 
this  upper  bound  in  terms  of  the  filtering  gains  {Lfk)}^: 

min//KLQ).  (5.24) 

M‘) 

Proposition  5.4.2.  The  optimal  solution  for  the  optimization  problem  (5.24)  is 

L*(k )  =  A(k)Q*(k)Ci(k)'  [l,#)  +  Ci(k)Q*(k)Ci(k)'\l ,  (5.25) 

and  the  optimal  value  is  given  by 

K  N 

J*K(L*(-))  =  J]Yjtr(Q*(k)), 

k=  1  (=1 

where  Q* (k)  is  computed  using 

Q*(k+  1)  =  Y?j=lPij(k)[A(k)Q)(k)A(ky  +Xw(k)-A(k)Q*(k)Cj(ky-  ^ 

■  (zVj.(k)  +  Cj(k)Q*(k)Cj(kyyl  C j(k)Q*(k)A(k)'  , 
with  Q*(  0)  =  X,(  0)  and  for  i  =  1 . . .  N. 

Proof.  Let  /^(L(-))  be  the  cost  function  when  an  arbitrary  set  of  filtering  gains  L(-)  = 
{Lfk),k  =  0. . . K- 1}^  is  used  in  (5.19).  We  will  show  that  /^(L*(-))  <  /^(LQ),  which  in 
turn  will  show  that  L*(-)  =  {Lfk)*,k  =  0 . .  .K- 1}^1  is  the  optimal  solution  of  the  optimiza¬ 
tion  problem  (5.24).  Let  {2-(^)}^j  and  {Qfk)}^  be  the  matrices  obtained  when  L*(-)  and 
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L(-),  respectively  are  substituted  in  (5.19).  In  what  follows  we  will  show  by  induction  that 
Q*(k)  <  Qi{k)  for  k  >  0  and  i  =  1 . . . N,  which  basically  proves  that  /^(L*(-))  <  /a;(L(-))> 
for  any  L(-).  For  simplifying  the  proof,  we  will  omit  in  what  follows  the  time  index  for 
some  matrices  and  for  the  consensus  weights. 

Substituting  {L*(k),k  >  0}^1  in  (5.19),  after  some  matrix  manipulations  we  get 

N 

Q*(k+  1)  =  ^  Pij  [. AQ*(k)A '  +  2V,  -  AQ)(k)C'j(ZVj+ 

7=1 

+CjQ*(k)C'jrlCjQ*(k)A'] ,  Q*(  0)  =  Zf(0),  i=l...N. 

We  can  derive  the  following  matrix  identity  (for  simplicity  we  will  give  up  the  time 
index): 


(A  +  L,C,)Qi(A,  +  LiCi Y  +  LfZv-Z,;  =  (A  +  L;C,)0,(A,  +  L*Q)'  +  L*£VX*'+ 


+(Ll-L*)(^Vi+ClQlC')(Ll-L*). 


(5.27) 


Assume  that  Q*{k)  <  Qj(k)  for  i  =  1  ...N.  Using  identity  (5.27),  the  dynamics  of 
Qi(k)*  becomes 


N 


Q*(k+  1)  =  ^  pij  ((A  +  Lj(k)Cj)Qj(k)(A  +  LffiCj)'  +  L;(fc)£v  X/fc)'- 


7=1 


-(£#)  -  L*(k))(I.Vj  +  CjQj^C'XLjik)  -  L*(k))'  +  Xvv) . 

The  difference  Q*(k+  1)  -  Qi(k  +  1)  can  be  written  as 

N 

Qi(k+iy  -  Qi(k+  1)  =  ^ Pi j ((A  +  Lj(k)C j)(Q*j(k)  -  Q j(k))(A  +  L j(k)C ))' - 


7=1 


-(L/fc)  -  L*(k))(I.Vj  +  CjQjikjC'KLjik)  -  kj(k))') . 
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Since  XV;.  +  CiQiik )Cf  is  positive  definite  for  all  k  >  0  and  i=l...N,  and  since  we  assumed 
that  Q*(k )  <  Qi(k),  it  follows  that  Q*(k+  1)  <  Qfk  +  1).  Hence  we  obtain  that 


for  any  set  of  filtering  gains  L(-)  =  {Lfk),k  =  0 . . .  K  -  1}N=V  which  concludes  the  proof.  □ 


We  summarize  in  the  following  algorithm  the  sub-optimal  CBDLF  scheme  resulting 
from  Proposition  5.4.2. 

Algorithm  1:  Consensus  Based  Distributed  Linear  Filtering  Algorithm 
Input,  po,  Po 

Initialization:  xfO)  =  po,  F,( 0)  =  Xo 

while  new  data  exists  do 
j  Compute  the  filter  gains: 


Lit-AYiC'i^+CiYiC'f 


Update  the  state  estimates: 


pi  <-  Axi  +  Lfyi  -  C  -  ixi) 

%  <-  YjPm 

i 

Update  the  matrices  K,/ 

N 

r  <-  2  V»  (o  -  W  -  Lfip  +  LjZrp)  + 

7=1 


end 


145 


5.4.2  Infinite  Horizon  Consensus  Based  Distributed  Filtering 

We  now  assume  that  the  matrices  A(k),  {Ci(k)}N=],  \T.V/(k)}\'=l  and  ’Lw(k)  and  the 
weights  {pij(k)^!j=]  (  are  time  invariant.  We  are  interested  in  finding  out  under  what  condi¬ 
tions  Algorithm  1  converges  and  if  the  filtering  gains  produce  stable  estimates.  From  the 
previous  section  we  note  that  the  optimal  infinite  horizon  cost  can  be  written  as 

N 

J*oo  =  lim  VHQm 

A:— >co  *  * 

i=  1 

where  the  dynamics  of  Qi{k)*  is  given  by 

n  r  _i 

e;^+i)  =  XiA7ke;(^,+2w-Ae}wc'(zVj+cJe;4Dc^  CjQpiA' ,  (5.28) 

7=1 

and  the  optimal  filtering  gains  are  given  by 

L*(k)  =  AQ*(k)C'  [Zv,.  +  C,Q*(k)C'\  ' , 

for  i  =  1 . . . N.  Assuming  that  (5.28),  converges,  the  optimal  value  of  the  cost  /£,  is  given 
by 

N 

i=  1 

where  {Qt)f=l  satisfy 

N 

Qi  =  X  Pij  [AQjA'  +  Zw  -AQjC'j&v,  +  CjQjCjf 1  CjQjA'  | .  (5.29) 

7=1 

Sufficient  conditions  under  which  there  exists  a  unique  solution  of  (5.29)  are  provided  by 
Proposition  A. 2.1,  which  says  that  if  (p,L,A)  is  detectable  and  (A,Zy  ,p)  is  stabilizable 
in  the  sense  of  Definitions  A.  1.1  and  A.  1.2,  respectively,  then  there  is  a  unique  solution 
of  (5.29)  and  lim^oo  Q*(k)  =  Qt. 
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Mimicking  Theorem  A.  12  of  [11],  it  can  be  shown  that  a  numerical  approach  to 
solve  (5.29)  (if  it  has  a  solution)  can  be  obtained  by  (numerically)  solving  the  following 
convex  programming  optimization  problem 


max  tr(^=lQi) 

-Qi  +  i  PijAQjA'  +  yfp^Ci  Q\A' 

y[pRAQiC\  Z^+CiGiC; 

yjPiNAQNC'N  0 

Qi>  0,  i=l...N. 


(5.30) 


\ 

^JpUCnQnA' 


0 


>0 


2 Vn  +  CnQnC'n  J 


5.5  Connection  with  Markovian  Jump  Linear  System  state  estimation 

In  this  section  we  present  a  connection  between  the  detectability  of  (5.2)  in  the 
sense  of  Definition  5.2.1  and  the  detectability  property  of  a  MJLS,  which  is  defined  in 
what  follows.  We  also  show  that  the  optimal  gains  of  a  linear  filter  for  the  state  estimation 
of  the  aforementioned  MJLS  can  be  used  to  approximate  the  solution  of  the  optimization 
problem  (5.11),  which  gives  the  optimal  CBDLF.  We  assume  that  the  matrix  P(k )  describ¬ 
ing  the  communication  topology  of  the  sensors  is  irreducible  and  doubly  stochastic  and 
we  assume,  without  loss  of  generality,  that  the  matrices  {Ci(k),k  >  0}^  in  the  sensing 
model  (5.3),  have  the  same  dimensions.  We  define  the  following  Markovian  jump  linear 
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system 


?(*+!)=  Aeik)m(k)  +  Bm(k)w(k ) 


(5.31) 


zw  =  cm(mk)+Dm(k)m,  m = 

where  g(k)  is  the  state,  z.(k)  is  the  output,  6(k)  e  {1, . . . , N)  is  a  Markov  chain  with  prob¬ 
ability  transition  matrix  P(k)' ,  w(k)  and  v(k)  are  independent  Gaussian  random  variables 
with  zero  mean  and  identity  covariance  matrices.  Also,  do  is  a  Gaussian  noise  with  mean 
fio  and  covariance  matrix  So-  We  denote  by  n t(k)  the  probability  distribution  of  0(k) 
0 Pr(6(k )  =  i)  -  n i(k))  and  we  assume  that  7r,(0)  >  0.  We  have  that  Ag^(k)  e  {Ai(k)}^=v 
Be{k)(k)  e  {Bi(k)}f=v  Cg{k)(k )  e  \C,(k)\^{  and  be{k)(k)  e  {Di(k)}f=v  where  the  index  i  refers 
to  the  state  i  of  Oik).  We  set 


Ai(k)  =  A(k),  Biik) 
Ci(k)  =  -j==Ci(k),  Diik) 


mm 

y/nJ{F) 

1 


XlJ\k), 


Zl'\k), 


(5.32) 


for  all  i,k  >  0  (note  that  since  P(k)  is  assumed  doubly  stochastic  and  irreducible  and 
7r,(0)  >  0,  we  have  that  n,(k)  >  0  for  all  i,k  >  0).  In  addition,  <$o,  9(k),  w(k )  and  v(k) 
are  assumed  independent  for  all  k  >  0.  The  random  process  Oik)  is  also  called  mode. 
Assuming  that  the  mode  is  directly  observed,  a  linear  filter  for  the  state  estimation  is 
given  by 


l(k+  1)  =  Ae{k)(k)lik)  +  Me(k)(k)(z(k)  -  Ce(k)(k)i(k)),  (5.33) 


where  we  assume  that  the  filter  gain  Mo(kj  depends  only  on  the  current  mode.  The  dy¬ 
namics  of  the  estimation  error  e(k)  =  d(k)  -  d(k)  is  given  by 


e(k+  1)  =  (Agk(k)  -  Mf)(k)(k)Co(k)(k)^e(k)+ 
+B0(k)(k)w(k)  -  Me(k)(k)bg(k)(k)v(k). 


(5.34) 
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Let  p(k)  and  Y(k)  denote  the  mean  and  the  covariance  matrix  of  e(k),  i.e.  pik)  = 
E\e(k)\  and  Y(k)  =  E[e(k)e(k)'],  respectively.  We  define  also  the  mean  and  the  covariance 
matrix  of  e(k),  when  the  system  is  in  mode  i,  i.e.  pjk)  =  E[e(k)ttf(k)=i}\  and  Yjk)  = 
E[e(k)e(k)'\{e(k)=i}\  where  1{e{k)=i)  is  the  indicator  function.  It  follows  immediately  that 
p(k)  =  Zjli Hi(k)  and  Y(k)  =  Zji  Yi(k). 

Definition  5.5.1.  The  optimal  linear  filter  (5.33)  is  obtain  by  minimizing  the  following 

quadratic  finite  horizon  cost  function 

K  K  N 

JK(M(-))  =  J]tr(Y(k))  =  zz  tKYfk)),  (5.35) 

k=  1  /t=l  i'=l 

where  M(-)  =  {Mfk),k  =  0...K  -  1}Zj  are  the  filter  gains  and  where  Mfk)  corre¬ 
sponds  to  Me(k)(k )  when  Oik)  is  in  mode  i.  We  can  give  a  similar  definition  for  an  optimal 
steady  state  filter  using  the  infinite  horizon  quadratic  cost  function. 

Definition  5.5.2.  Assume  that  the  matrices  Afk),  Cfk)  and  P(k)  are  constant  for  all  k  >  0. 
We  say  that  the  Markovian  jump  linear  system  (5.31)  is  mean  square  detectable  if  there 
exits  such  that  lim^eo  E[\\e(k)\\2  \  =  0,  when  the  noises  w(k)  and  v(k)  are  set  to 

zero. 

The  next  result  makes  the  connection  between  the  detectability  of  the  MJLS  defined 
above  and  the  distributed  detectability  of  the  process  (5.2). 

Proposition  5.5.1.  If  the  Markovian  jump  linear  system  (5.31 )  is  mean  square  detectable, 
then  the  linear  stochastic  system  (5.2)-(5.3)  is  detectable  in  the  sense  of  Definition  5.2.1. 

Proof.  In  the  context  of  this  proposition,  the  dynamics  of  the  estimation  error  for  the 
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MJLS  (5.31)  becomes 


e(k  +  1)  =  (A  -  Me{k)Co(k))e(k),  e(0)  =  e0. 


where  C,  =  C;.  It  is  not  difficult  to  check  that  the  dynamic  equations  for  the  covariance 

matrices  {Yj(k)}^zl  and  the  mean  vectors  {pi(k)jN=l  are  given  by 

N  1  1 
p 1  wO)  'I 

with  7,(0)  =  K°  and 


V^i(O) 


(5.36) 


#(*+ 1)  =  Ypij(A-Mj-^Cj)fij(k)MO)=f*<i,  (5-37) 

JT7  '  V^'(O) 

for  i  =  1 . . .  A/’.  Since  the  MJLS  is  assumed  mean  square  detectable  it  follows  that  there 
exists  a  set  of  matrices  {Mjp=l  such  that  (5.36)  is  asymptotically  stable.  But  this  also 
implies  (see  for  instance  Proposition  3.6  of  [11])  that  (5.37)  is  asymptotically  stable  as 
well.  Setting  L\  -  ^,(0)M/,  we  see  that  (5.37)  is  identical  to  equation  (5.7)  and  therefore 
(5.7)  is  asymptotically  stable  (when  ignoring  the  noise).  Hence,  (5.2)  is  detectable  in  the 
sense  of  Definition  5.2.1.  □ 


The  next  result  establishes  that  the  optimal  gains  of  the  filter  (5.33)  can  be  used  to 
approximate  the  solution  of  the  optimzation  problem  (5.11). 


Proposition  5.5.2.  Let  Af  (•)  =  {M*(k),k  =  0, 1}^  be  the  optimal  gains  of  the 


linear  filter  (5.33).  If  we  set  Lfk) 


t 


=M*(k)  as  filtering  gains  in  the  CBDLF  scheme, 


ynR 0) 

then  the  filter  cost  function  (5.10)  is  guaranteed  to  be  upper  bounded  by 

K  N  . 

k=Q  i=l  ,V  ’ 

where  Y*(k )  are  the  covariance  matrices  resulting  from  minimizing  (5.35). 


(5.38) 
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Proof.  By  Theorem  5.5  of  [11],  the  filtering  gains  that  minimize  (5.35)  are  given  by 


M*(k)  =  Ai(k)Y*(k)Ci(ky  [n i{k)bj{k)bj(k)'  +  Ci(k)Y*(k)Ci(k)']  1 ,  (5.39) 


for  i  =  where  Y*(k)  satisfies 

Y*(k+  1)  =  pijik)  \A j(k)Y*(k)A  j(k)'  +  n j(k)B  j(k)B  j(k)' - 

J  1  7  (5.40) 

-Aj(k)Y*(k)Cj(ky  (n j(k)D j(k)D j(k)'  +  C j(k)Y*(k)C j(k)'\l  Cj(k)Y*(k)Aj(ky  . 

In  what  follows  we  will  show  by  induction  that  Y*(k )  =  nj(0)Q*(k)  for  all  i,k  >  0,  where 
Q*(k )  satisfies  (5.26).  For  k  =  0  we  have  r(0)  =  tz7(0)F*(0)  =  0)£0  =  7t,(0)(T(0).  Let 

us  assume  that  Y*(k )  =  nj(0)Q*(k).  Then,  from  (5.32)  we  have 

nj{k)Bj{k)Bj{ky  =  ni(0)Y.w(k),  n j{k)D fk)D fk)'  =  I.Vi(k), 

(5.41) 

nj(k)D  j(k)D  j(k)'  +  C j(k)Yj(k)C j(k)'  =  YVj(k)  +  C j(k)Q*(k)C fk)' . 

Also, 

M*(k)  =  7Ti(0)A(k)Q* (k)Ci(k)'  [YVj(k)  +  C j{k)Q){k)C j{k)'\l ,  (5.42) 

and  from  (5.25)  we  get  that  M*(k )  =  V^dO)^*^).  From  (5.40)  and  (5.41)  it  can  be  easily 
argued  that  Y*(k+  1)  =  nj(0)Q*(k+  1).  By  Corollary  5.4.1  we  have  that 

AKL(-))  <  Jk(U-)), 


for  any  set  of  filtering  gains  L(-)  and  in  particular  for  Lfk)  =  j^MAk)  =  L*(k),  for  all  i 
and  k.  But  since 


K  N 


k=0  i=  1 


7T,(0) 


YUk), 


the  result  follows. 


□ 
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Chapter  6 


Conclusions 

In  Chapter  2  we  studied  a  multi-agent  subgradient  method  under  random  communi¬ 
cation  topology.  Under  an  i.i.d.  assumption  on  the  random  process  governing  the  evolu¬ 
tion  of  the  topology,  we  derived  upper  bounds  on  two  performance  metrics  related  to  the 
CBMASM.  The  first  metric  reflects  how  close  each  agent  can  get  to  the  optimal  value.  The 
second  metric  reflects  how  close  and  fast  the  agents’  estimates  of  the  decision  vector  can 
get  to  the  minimizer  of  the  objective  function,  and  it  was  analyzed  for  a  particular  class  of 
convex  functions.  All  the  aforementioned  performance  measures  were  expressed  in  terms 
of  the  probability  distribution  of  the  random  communication  topology.  In  addition  we 
showed  how  the  distributed  optimization  algorithm  can  be  used  to  perform  collaborative 
system  identification,  application  which  can  be  useful  in  collaborative  tracking 

In  Chapter  3  we  emphasized  the  importance  of  the  convexity  concept  and  in  par¬ 
ticular  the  importance  of  the  convex  hull  notion  for  reaching  consensus.  We  did  this 
by  generalizing  the  asymptotic  consensus  problem  to  the  case  of  convex  metric  spaces. 
For  a  group  of  agents  taking  values  in  a  convex  metric  space,  we  introduced  an  itera¬ 
tive  algorithm  which  ensures  asymptotic  convergence  to  agreement  under  some  minimal 
assumptions  for  the  communication  graph.  As  an  application,  we  provided  an  iterative 
algorithm  which  guarantees  convergence  to  consensus  of  opinion. 

In  Chapter  4  we  analyzed  the  convergence  properties  of  the  linear  consensus  prob- 
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lem,  when  the  communication  topology  is  modeled  as  a  directed  random  graph  with  an 
underlying  Markovian  process.  We  addressed  both  the  cases  where  the  dynamics  of  the 
agents  are  expressed  in  continuous  and  discrete  time.  Under  some  assumptions  on  the 
communication  topologies,  we  provided  a  rigorous  mathematical  proof  for  the  intuitive 
necessary  and  sufficient  conditions  for  reaching  average  consensus  in  the  mean  square 
and  almost  sure  sense.  These  conditions  are  expressed  in  terms  of  connectivity  proper¬ 
ties  of  the  union  of  graphs  corresponding  to  the  states  of  the  Markov  process.  The  aim 
of  this  work  has  been  to  show  how  mathematical  techniques  from  the  stability  theory  of 
the  Markovian  jump  systems,  in  conjunction  with  results  from  the  matrix  and  graph  the¬ 
ory  can  be  used  to  prove  convergence  results  for  consensus  problems  under  a  stochastic 
framework. 

In  Chapter  5  we  first  provided  (testable)  sufficient  conditions  under  which  stable 
consensus-based  distributed  linear  filters  can  be  obtained.  Second,  we  gave  a  sub-optimal, 
linear  filtering  scheme,  which  can  be  implemented  in  a  distributed  manner  and  is  valid 
for  time  varying  communication  topologies  as  well,  and  which  guarantees  a  quantifiable 
level  of  performance.  Third,  under  the  assumption  that  the  stochastic  matrix  used  in  the 
consensus  step  is  doubly  stochastic  we  showed  that  if  an  appropriately  defined  Markovian 
jump  linear  system  is  detectable,  then  the  stochastic  process  of  our  interest  is  detectable 
as  well.  We  also  showed  that  the  optimal  gains  of  the  consensus-based  distributed  linear 
filter  scheme  can  be  approximated  by  using  the  optimal  linear  filter  for  the  state  estimation 
of  a  particular  Markovian  jump  linear  system. 

As  future  directions,  an  immediate  extension  of  the  results  of  Chapter  2  is  the  gener¬ 
alization  of  the  convergence  analysis  to  case  where  the  communication  topology  is  mod- 
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eled  by  a  Markovian  random  graph.  The  results  introduced  in  Chapter  4  provide  the 
appropriate  framework  to  this  end.  In  Chapter  5  we  proposed  a  distributed  algorithm  for 
the  state  estimation  of  a  process  observed  by  a  network  of  sensors.  When  considering 
wireless  networks,  another  relevant  problem  is  designing  network  architectures  aimed  at 
ensuring  good  estimation  performance  and  network  longevity.  The  problem  increases  in 
complexity  if  we  impose  the  solution  to  be  obtained  in  a  distributed  manner.  Due  to  the 
communication  costs  inherent  to  a  wireless  network,  the  network  architecture  should  be 
a  result  of  a  tradeoff  between  the  need  for  rich  communication  neighborhoods  for  obtain¬ 
ing  accurate  and  stable  estimates  and  the  need  for  small  communication  neighborhoods 
for  energy  conservation.  Our  approach  will  consist  in  formulating  the  network  architec¬ 
ture  design  problem  as  a  constraint  optimization  problem  which  is  solved  in  a  distributed 
manner  by  the  sensors.  The  main  cost  should  reflect  the  relevance  of  the  sensor  measure¬ 
ments  for  the  estimation  process,  while  the  constraints  should  reflect  the  limited  energy 
available  for  communication  and  the  need  to  ensure  rich  enough  local  neighborhoods  for 
computing  the  state  estimates. 

As  we  showed  in  Chapters  2  and  5,  the  consensus  problem  represents  a  tool  for 
localizing  algorithms  in  distributed  computing.  Important  optimization  problems  go  be¬ 
yond  the  realm  of  R".  For  example,  as  we  have  mentioned  in  the  introduction  chapter, the 
trusted  routing  problem  is  formulated  on  the  Max-plus  semiring,  while  the  design  of  net¬ 
work  topology  can  be  formulated  on  a  Hamming  space.  We  plan  to  continue  the  analysis 
started  in  Chapter  3,  and  formulate  the  consensus  problem  on  semirings,  and  in  particular 
on  the  Max-plus  algebra.  One  of  our  goals  is  to  explore  the  feasibility  of  using  consensus 
to  localize  the  algorithms  used  for  solving  optimization  problems  on  spaces  where  the 
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operations  and  relations  are  described  by  the  Max-plus  algebra,  for  example.  A  simple 
model  for  a  graph  link  is  obtained  by  assigning  to  the  link  a  boolean  value.  By  stacking 
all  possible  links,  we  obtain  a  vector  whose  entries  can  take  zero/one  values  (correspond¬ 
ing  to  the  existence  or  non-existence  of  links),  and  which  lives  in  a  Hamming  space. 
As  we  have  previously  commented,  designing  communication  topologies  is  an  important 
problem  in  distributed  optimization,  estimation  and  control  applications,  in  particular  in 
the  case  of  wireless  networks  for  which  usually  the  resources  are  scarce.  Another  goal 
of  ours  is  to  study  the  possibility  of  using  the  consensus  problem  formulated  on  Ham¬ 
ming  spaces  for  solving  distributed  optimization  problems  whose  result  should  provide 
a  network  architecture,  specifically  designed  for  a  particular  task,  such  as  estimation  or 
optimization. 
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Appendix  A 


Discrete-Time  Coupled  Matrix  Equations 

A.l  Properties  of  a  special  class  of  difference  matrix  equations 

Given  a  positive  integer  N,  a  sequence  of  positive  numbers  p  =  { /?// } 'v/= ,  and  a  set 

of  matrices  F  =  we  consider  the  following  matrix  difference  equations 

N 

wi(k+i)=YjPijFjwj(k)F'j,  wm  =  w{;,  i  =  \...N.  (A.i) 

j= i 

Additionally,  consider  a  similar  set  of  matrix  difference  equations 

N 

Wi(k+\)  =  YJpJlF'wJ(k)Fj,  Wi(0)  =  Wf,i  =  l...N.  (A.2) 

j=  i 

Proposition  A.1.1.  [9]  The  dynamics  (A.l)are  asymptotically  stable  if  and  only  if  the 
dynamics  (A.2)  are  asymptotically  stable. 

Related  to  the  above  dynamic  equations,  we  introduce  the  following  stabilizability 
and  detectability  definitions. 

Definition  A.1.1.  [10]  Given  a  set  of  matrices  C  =  {CjfyL v  we  say  that  (p,L,A)  is  de¬ 
tectable  if  there  exists  a  set  of  matrices  L  =  such  that  the  dynamics  (A.l)  is  asymp¬ 

totically  stable,  where  Fj  =  A,-  -  L/C,-,  for  i  =  1 . . .  N. 

Definition  A.1.2.  [10]  Given  a  set  of  matrices  C  =  {C,}'^,  we  say  that  (. A,L,p )  is  stabi- 
lizable,  if  there  exists  a  set  of  matrices  L  =  {L,}^  such  that  the  dynamics  (A.l)  is  asymp¬ 
totically  stable,  where  Fj  =  A,-  -  C\L\,  for  i  =  1  ...  A. 
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Remark  A.1.1.  Given  a  semipositive  definite  matrix  X  and  a  positive  definite  matrix  Y, 


the  following  holds: 


min  Ai{Y)tr{X)  <  lr(YX)  <  max  A i{Y)tr{X) 


Proposition  A.1.2.  If  there  exists  a  set  of  symmetric  positive  definite  matrices  {V,}^  such 
that 

N 

^  (A. 3) 


Vi=YjPi‘F'iViFi+Si, 

7=1 

for  some  set  of  symmetric  positive  definite  matrices  {S  i}f=v  then  the  dynamics  (A.l)  are 
asymptotically  stable. 


Proof  We  use  the  same  idea  as  in  the  proof  of  Theorem  3.19  of  [11]  and  define  the 
following  Lyapunov  function 

N 


0(k)  =  YjtriWiikm. 

i=  1 


In  the  following  we  show  that  the  difference  Of/:  +  1)  -  Of k)  is  negative  for  all  k  >  0,  from 


which  we  infer  the  asymptotic  stability  of  (A.l).  We  get  that 

(  N  (  N 


0(lfc+l)-0(fc)  =  tr 


Z  Z  P'iWVF'j 

U=  1  V7=l 


A  A 

vt-wmvi 


tr 


N 


N 


N 


J 


Zw)  Zp/,w;-^  =Z  tr{Wj(k)S  j). 

l»=t  v7=t  ))  i=  l 

Since  {Wfk)}N=l  are  positive  semi-definite  matrices  for  k  >  0  and  are  positive  defi¬ 

nite,  by  Remark  A.1.1,  it  follows  that 


0(k+  1)  -  Of/:)  <  0,  k>  0. 


□ 
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Proposition  A.1.3.  If  there  exists  a  set  of  symmetric  positive  definite  matrices  {P/}^  such 
that 

N 

Vi  =  YJPijF'iVjFi  +  Si,  (A.4) 

7=1 

for  some  set  of  symmetric  positive  definite  matrices  {5I,}'^1,  then  the  dynamics  (A.l)  are 
asymptotically  stable. 

Proof  Using  the  same  approach  as  in  the  previous  proposition,  we  prove  the  asymptotic 
stability  of  the  dynamics  (A. 2).  Using  Proposition  A. 1.1,  the  result  follows.  □ 

Proposition  A.1.4.  If  the  following  linear  matrix  inequalities  are  feasible 


Xi 

s/PuXjF  j 

■sfpTiF'Xi  •• 

•  sfmF’iXi 

sfPuXjFj 

X{ 

0 

0 

sfPliXjFi 

0 

x2 

0 

>0, 

(A.5) 

sfPNiXjFi 

0 

0 

XN 

for  i  -  1  ...  A,  where  {X,}^  are  the  unknown  variables,  then  the  dynamics  (A.l)  are 
asymptotically  stable. 

Proof.  By  the  Schur  complement  lemma,  (A. 5)  are  feasible  if  and  only  if 

N 

X,  -  £  pjiXiFiX 7 1  F'X,  >  0,  Xi  >  0,  i  =  1 . . .  N.  (A.6) 

7=1 

By  defining  Vj  =  X~l,  i=  I...N,  (A.6),  becomes 

N 

v ,  -YjPjiF'VjF'i  >  0,  Vi  >  0,  i  =  1 . . . N. 

7=1 

By  Proposition  A.  1.2,  (A.l)  is  asymptotically  stable.  □ 
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Inspired  by  Proposition  A.  1.4,  detectability  and  stabilizability  tests,  in  the  sense  of 
Definitions  A.  1.1  and  A.  1.2,  respectively,  can  be  formulated  in  terms  of  the  feasibility  of 
a  set  of  linear  matrix  inequalities. 

Proposition  A.1.5  (detectability  test).  If  the  following  matrix  inequalities  are  feasible 

Xi  yfptiXiAi-YiCi )  fpTiiX-A,  -  YiCj)  •••  ■fpIf(XiAi-YiCi) 

sfpiiiXiAi  -  YfCiY  X j  0  •••  0 

s/pniXAi  -  YjCi)'  0  X2  •••  0 

V^(AA-P,C,)'  0  0  •••  XN 

(A.7) 

for  i  =  where  |A,}^|  and  { P'/ } ^ ,  are  the  unknown  variables,  then  (p,L,A)  is  de¬ 

tectable  in  the  sense  of  Definition  A.  1.1.  Moreover  chosing  Li  =  XT 1  Yh  for  i  =  1  ...  A,  the 
dynamics  (A.l)  are  asymptotically  stable. 

Proof.  By  the  Schur  complement  lemma,  (A.7)  are  feasible  if  and  only  if 

N 

Xi  -  Yj  PifiX-Ai  -  YiCi)Xj\XiAi  -  YjCj)'  >  0,  Xi  >  0,  i  =  1 . . . N.  (A.8) 
A' 

By  defining  L;-  =  XjlYj  and  Vj  =  XT1,  i  =  1  ...N,  (A.8),  becomes 

N 

Vi  -  Y  PijFiVjF'i  >  0,  Vt>  0,  i  =  1 . . .  N. 

7=1 

By  Proposition  A.  1.3,  (p,L,A)  is  detectable  in  the  sense  of  Definition  A.  1.1.  □ 
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Proposition  A.1.6  (stabilizability  test).  If  the  following  matrix  inequalities  are  feasible 

Xt  yfpTiiXiAi-CiYiY  sfpiiiXiAi  -  CjYi)'  •••  -  <W 

yfpTiiX'Ai-CiYi )  X,  0  •••  0 

sfPiYXiAi  -  CjYj)  0  X2  ■■■  0 

fpEiXiAi-CfYi)  0  0  •••  XN 

(A.9) 

for  i  =  1 . . .  A,  where  {X/}^  cmc/  !5//}'^]  are  the  unknown  variables,  then  (. A,L,p )  is  stabi- 
lizable  in  the  sense  of  Definition  A.  1.2.  Moreover  choosing  Lj  =  YjXj 1 ,  /or  i-  1 . . .  A,  //ir 
dynamics  (A.l)  are  asymptotically  stable. 

Proof.  By  the  Schur  complement  lemma,  (A.9)  are  feasible  if  and  only  if 

N 

X,  -J^pjfXiAi  -  YiQYXj'iXiAi  -  YjCi )  >  0,  Xt  >  0,  i  =  1 . . . N.  (A.10) 

7=1 

By  defining  L,  =  XT 1  Y,  and  Vj  =  XT1,  i  =  1 . .  .A,  (A. 10),  becomes 

N 

Vi  -^PpF'iVjFi  >  0,  Vf  >  o,  i  =  1 . . .  N. 

7=1 

By  Proposition  A.  1 .2,  (p,  L,  A)  is  stabilizable  in  the  sense  of  Definition  A.  1 .2.  □ 

A.2  Discrete-time  coupled  Riccati  equations 

Consider  the  following  coupled  Riccati  difference  equations 

N 

Qi(k+  1)  =  ^  Plj  {AjQfkjA'j  -  AjQjikjC'jiCjQfikjC)  +  ZVj)~ 1  CjQfkjA'j  +  Xvv) , 

i=  1 

(A.  11) 

<2,(0)  =  <2°  >  0,  i  =  1 . . . N,  where  {Sy,  }^  and  arc  symmetric  positive  definite  matrices. 


160 


Proposition  A.2.1.  Let  x'/“  =  where  XV;.  =  xj^2  X,1/2.  Suppose  that  (p,C,A )  is 

detectable  and  that  (A,X,/  ,/>)  is  stabilizable  in  the  sense  of  Definitions  A.1.1  and  A.  1.2, 
respectively.  Then  there  exists  a  unique  set  of  symmetric  positive  definite  matrices  Q  = 

{0/(^1  satisfying 
N 

Qi  =  ^pij(AjQjA’J-AJQjC'J(CjQjC'J  +  I.rirICjQjA'j  +  i:w),i=l..M  (A.12) 

1=1 

Moreover,  for  any  initial  conditions  Q{-  >  0,  we  have  that  lim^oo  Q,(k)  =  Qj. 

Proof.  The  proof  can  be  mimicked  after  the  proof  of  Theorem  1  of  [10].  Compared  to  our 
case,  in  Theorem  1  of  [10],  scalar  terms,  taking  values  between  zero  and  one,  multiply 
the  matrices  T.Vj  in  (A.12).  However  it  is  not  difficult  to  note  that  the  result  holds  even  in 
the  case  where  these  scalar  terms  take  the  value  one,  which  corresponds  to  our  setup.  □ 


161 


Bibliography 


[1]  T.  Ayasal,  M.  Yildiz,  A.  Sarwate,  and  A.  Scaglione.  Broadcast  gossip  algorithms  for 
consensus.  IEEE  Transaction  on  Signal  Processing,  57(7) : 2748—27 6 1 ,  July  2009. 

[2]  M.  Rabi  B.  Johansson  and  K.H.  Johansson.  A  randomized  incremental  subgradi¬ 
ent  method  for  distributed  optimization  in  networked  systems.  SIAM  Journal  on 
Optimization,  20(3):  1157-1 170,  2009. 

[3]  J.S.  Baras  and  R  Hovareshti.  Effects  of  topology  in  networked  systems:  Stochastic 
methods  and  small  worlds.  Proceedings  of  the  47th  IEEE  Conference  on  Decision 
and  Control,  pages  2973-2978,  Dec  2008. 

[4]  V.D.  Blondel,  J.M.  Hendrickx,  A.  Olshevsky,  and  J.N.  Tsitsiklis.  Convergence  in 
multiagent  coordination,  consensus,  and  flocking.  Proceedings  of  the  44th  IEEE 
Conference  on  Decision  and  Control,  pages  2996-3000,  Dec  2005. 

[5]  V.  Borkar  and  P.  Varaya.  Asymptotic  agreement  in  distributed  estimation.  IEEE 
Trans.  Autom.  Control,  AC-27(3):650-655,  Jun  1982. 

[6]  S.  Boyd,  P.  Diaconis,  and  L.  Xiao.  Fastest  mixing  markov  chain  on  a  graph.  SIAM 
Review,  46(4): 667-689,  Dec  2004. 

[7]  S.  Boyd,  A.  Ghosh,  B.  Prabhakar,  and  D.  Shah.  Randomized  gossip  algorithms. 
IEEE/ACM  Trans.  Netw.,  14(SI):2508-2530,  2006. 

[8]  R.  Carli,  A.  Chiuso,  L.  Schenato,  and  S.  Zampieri.  Distributed  kalman  filtering 
based  on  consensus  strategies.  IEEE  Journal  on  Selected  Area  in  Communication, 
26(4):622-633,  May  2008. 

[9]  O.L.V.  Costa  and  M.D.  Fragoso.  Stability  results  for  discrete-time  linear  systems 
with  markovian  jumping  parameters.  Journal  of  Mathematical  Analysis  and  Appli¬ 
cation,  179:154-178, 1993. 

[10]  O.F.V.  Costa  and  M.D.  Fragoso.  Discrete-time  coupled  riccati  equations  for  sys¬ 
tems  with  markov  switching  parameters.  Journal  of  Mathematical  Analysis  and 
Application,  194:197-216,  1995. 

[11]  O.F.V.  Costa,  M.D.  Fragoso,  and  R.P  Marques.  Discrete-Time  Markov  Jump  Linear 
Systems.  Springer- Verlag,  Fondon,  2005. 

[12]  A.G.  Dimakis,  S.  Kar,  J.M.F.  Moura,  M.G.  Rabbat,  and  A.  Scaglione.  Gossip  algo¬ 
rithms  for  distributed  signal  processing.  arXiv:1003.5309vl  [cs.DC],  March  2010. 

[13]  J.C.  Duchi,  A.  Agarwal,  and  M.J.  Wainwright.  Distributed  dual  averaging  in 
netwroks.  Proceedings  of  the  2010  Neural  Information  System  Foundation  Con¬ 
ference,  December  2010. 


162 


[14]  J.A.  Fax  and  R.M.  Murray.  Information  flow  and  cooperative  control  of  vehicles 
formations.  IEEE  Trans.  Autom.  Control ,  49(9):  1456-1476,  Sept  2004. 

[15]  X.  Feng  and  K.A.  Loparo.  Stability  of  linear  markovian  systems.  Proceedings  of 
the  29th  IEEE  Conference  on  Decision  and  Control,  pages  1408-1413,  Dec  1990. 

[16]  M.D.  Fragoso  and  O.L.V.  Costa.  A  unified  approach  for  stochastic  and  mean-square 
stability  of  continuous-time  linear  systems  with  markovian  jumping  parameters  and 
additive  disturbances.  SIAM  Journal  on  Control  and  Optimization,  44(4):  1 165— 
1191,2005. 

[17]  Y.  Hatano  and  M.  Mesbahi.  Agreement  over  random  networks.  IEEE  Trans.  Autom. 
Control,  50(1 1):  1867—1872,  2005. 

[18]  A.  Jadbabaie,  J.  Lin,  and  A.S.  Morse.  Coordination  of  groups  of  mobile  autonomous 
agents  using  nearest  neighbor.  IEEE  Trans.  Autom.  Control,  48(6):998-1001,  Jun 
2004. 

[19]  B.  Johansson.  On  distributed  optimization  in  network  systems.  Ph.D.  dissertation 
in  Telecommunication,  2008. 

[20]  B.  Johansson,  T.  Keviczky,  M.  Johansson,  and  K.H.  Johansson.  Subgradient  meth¬ 
ods  and  consensus  algorithms  for  solving  convex  optimization  problems.  Proceed¬ 
ings  of  the  47th  IEEE  Conference  on  Decision  and  Control,  pages  4185-4190,  Dec 
2008. 

[21]  S.  Kandukuri  and  S.  Boyd.  Optimal  power  control  in  interference-limited  fading 
wireless  channels  with  outage-probability  specifications.  IEEE  Transactions  on 
Wireless  Communications,  1(1),  Jan  2002. 

[22]  I.  Lobel  and  A.  Ozdalgar.  Convergence  analysis  of  distributed  subgradient  methods 
over  random  networks.  Proceedings  of  46th  Allerton  Conference  on  Communication, 
Control,  and  Computing,  pages  353-360,  Sept  2008. 

[23]  I.  Matei,  N.  Martins,  and  J.  Baras.  Almost  sure  convergence  to  consensus  in  marko¬ 
vian  random  graphs.  Proceedings  of  the  47th  IEEE  Conference  on  Decision  and 
Control,  pages  3535-3540,  Dec  2008. 

[24]  I.  Matei,  N.  Martins,  and  J.  Baras.  Optimal  state  estimation  for  discrete-time  marko¬ 
vian  jump  linear  systems,  in  the  presence  of  delayed  mode  observations.  Proceed¬ 
ings  of  the  2008  IEEE  American  Control  Conference,  pages  3560-3565,  Jun  2008. 

[25]  I.  Matei,  N.  Martins,  and  J.  Baras.  Optimal  state  estimation  for  discrete-time  marko¬ 
vian  jump  linear  systems,  in  the  presence  of  delayed  output  observations.  Proceed¬ 
ings  of  the  IEEE  2008  Information  Theory  Workshop,  pages  237-242,  May  2008. 

[26]  I.  Matei,  N.  Martins,  and  J.  Baras.  Consensus  problems  with  directed  markovian 
communication  patterns.  Proceedings  of  the  2009  IEEE  American  Control  Confer¬ 
ence,  pages  1298-1303,  Jun  2009. 


163 


[27]  A.  Moraglio,  C.  Di  Chio,  J.  Togelius,  and  R.  Poli.  Geometric  particle  swarm  opti¬ 
mization.  Journal  of  Artificial  Evolution  and  Applications,  Article  ID  143624,  14 
pages,  2008. 

[28]  A.  Moraglio  and  J.  Togelius.  Geometric  particle  swarm  optimization  for  the  sudoku 
puzzle.  Proceedings  of  the  9th  Conference  on  Genetic  and  Evolutionary  Computa¬ 
tion,  pages  118-125,  2007. 

[29]  L.  Moreau.  Stability  of  multi-agents  systems  with  time-dependent  communication 
links.  IEEE  Trans.  Autom.  Control,  50(2):  169—182,  Feb  2005. 

[30]  A.  Nedic  and  D.  Bertsekas.  Convergence  rate  of  incremental  subgradient  algorithm. 
Stochastic  Optimization:  Algorithms  and  Applications,  pages  263-304,  2000. 

[31]  A.  Nedic  and  D.  Bertsekas.  Incremental  subgradient  methods  for  nondifferential 
optimization.  SIAM  Journal  on  Optimization,  12(1):  109-138,  2001. 

[32]  A.  Nedic  and  A.  Ozdaglar.  Convergence  rate  for  consensus  with  delays.  Journal  of 
Globed  Optimization ,  47(3):437-456,  Jul  2010. 

[33]  A.  Nedic,  A.  Ozdaglar,  and  P.A.  Parrilo.  Constrained  consensus  and  optimization  in 
multi-agent  networks.  IEEE  Trans.  Autom.  Control,  5 5(4): 922-93 8,  Apr  2010. 

[34]  A.  Nedic  and  A.  Ozdalgar.  Distributed  subgradient  methods  for  multi-agent  opti¬ 
mization.  IEEE  Trans.  Autom.  Control,  54(1):48— 61,  Jan  2009. 

[35]  C.T.K.  Ng,  M.  Medard,  and  A.  Ozdaglar.  Completion  time  minimization  and  robust 
power  control  in  wireless  packet  networks.  arXiv:0812.3447vl  [cs.IT],  Dec  2008. 

[36]  A.  Olshevsky  and  J.N.  Tsitsiklis.  Convergence  speed  in  distributed  consensus  and 
averaging.  SIAM  Journal  on  Control  and  Optimization ,  48(1):33— 55,  Sept  2009. 

[37]  B.T.  Polyak.  Introduction  to  Optimization.  Optimization  Software,  Inc,  Publications 
Division,  New  York,  1987. 

[38]  M.  Porfiri  and  D.J.  Stilwell.  Consensus  seeking  over  random  directed  weighted 
graphs.  IEEE  Trans.  Autom.  Control,  52(9):  1767-1773,  Sept  2007. 

[39]  W.  Ren  and  R.W.  Beard.  Consensus  seeking  in  multi-agents  systems  under  dynami¬ 
cally  changing  interaction  topologies.  IEEE  Trans.  Autom.  Control,  50(5):655-661, 
May  2005. 

[40]  R.O.  Saber.  Distributed  kalman  filter  with  embedded  consensus  filters.  Proceedings 
of  the  44th  IEEE  Conference  on  Decision  and  Control,  2005. 

[41]  R.O.  Saber.  Distributed  kalman  filtering  for  sensor  networks.  Proceedings  of  the 
46th  IEEE  Conference  on  Decision  and  Control,  pages  5492-5498,  2007. 

[42]  R.O.  Saber  and  R.M.  Murray.  Consensus  protocols  for  networks  of  dynamic  agents. 
Proceedings  of  the  2003  IEEE  American  Control  Conference,  pages  951-956,  2003. 


164 


[43]  R.O.  Saber  and  R.M.  Murray.  Consensus  problem  in  networks  of  agents  with  switch¬ 
ing  topology  and  time-delays.  IEEE  Trans.  Autom.  Control,  49(9):  1520-1533,  Sept 
2004. 

[44]  A.  Tahbaz  Salehi  and  A.  Jadbabaie.  Necessary  and  sufficient  conditions  for  con¬ 
sensus  over  random  networks.  IEEE  Trans.  Autom.  Control,  53(3):791-795,  Apr 
2008. 

[45]  A.  Tahbaz  Salehi  and  A.  Jadbabaie.  Consensus  over  ergodic  stationary  graph  pro¬ 
cesses.  IEEE  Trans.  Autom.  Control,  55(l):225-230,  Jan  2010. 

[46]  B.K.  Sharma  and  C.L.  Dewangan.  Fixed  point  theorem  in  convex  metric  space.  Novi 
Sad  Journal  of  Mathematics,  25(1):9— 1 8,  1995. 

[47]  K.K.  Somasundaram  and  J.S.  Baras.  Path  Optimization  Techniques  for  Trusted  Rout¬ 
ing  in  Tactical  Mobile  Ad-Hoc  Networks.  ISR  Technical  Report  2009-3,  2009. 

[48]  A.  Speranzon,  C.  Fischione,  K.  H.  Johansson,  and  A.  Sangiovanni-Vincentelli.  A 
distributed  minimum  variance  estimator  for  sensor  networks.  IEEE  Journal  on  Se¬ 
lect  Areas  in  Communication,  26(4): 609-621,  May  2008. 

[49]  W.  Takahashi.  A  convexity  in  metric  mpace  and  non-expansive  mappings  i.  Kodai 
Math.  Sent.  Rep.,  22:142-149,  1970. 

[50]  D.  Teneketzis  and  R  Varaiya.  Consensus  in  distributed  estimation.  Advances  in 
Statistical  Signal  Processing,  pages  361-386,  Jan  1988. 

[51]  J.N.  Tsitsiklis.  Problems  in  decentralized  decision  making  and  computation.  Ph.D. 
dissertation,  Dept.  Electr.  Eng.,  Nov  1984. 

[52]  J.N.  Tsitsiklis,  D.P.  Bertsekas,  and  M.  Athans.  Distributed  asynchronous  determin¬ 
istic  and  stochastic  gradient  optimization  algorithms.  IEEE  Trans.  Autom.  Control, 
31(9):803-812,  Sept  1986. 

[53]  D.J.  Watts  and  S.H  Strogatz.  Collective  dynamics  of  small  world  networks.  Nature, 
393:440-442,  1998. 

[54]  J.  Wolfowitz.  Product  of  indecomposable  aperiodic  stochastic  matrices.  Proc.  Am. 
Math.  Soc.,  15:733-736,  1963. 


165 


